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Preface 



The 28th International Colloquium on Automata, Languages and Programming 
(ICALP 2001) was held July 8-12, 2001 in the Aldemar-Knossos Royal Village 
near Hersonissos on Crete, Greece. This volume contains all contributed papers 
presented at ICALP 2001, together with the invited lectures by Ahmed Bouaj- 
jani (Paris), Martin Grofie-Rhode (Berlin), Mogens Nielsen (Aarhus), and Ingo 
Wegener (Dortmund) and two of the keynote lectures, by Christos Papadimitriou 
and Boris Trakhtenbrot. 

For almost 30 years now, ICALP has been the main annual event of the 
European Association for Theoretical Computer Science (EATCS). The ICALP 
program currently consists of track A: Algorithms, Automata, Complexity, and 
Carnes and track B: Logic, Semantics, and Theory of Programming. 

In response to the Call for Papers, the program committee received 208 sub- 
missions: 162 for track A, 46 for track B. The committee met on March 23/24, 
2001 in Barcelona and selected 80 papers for inclusion into the scientific program. 
The selection was based on originality, quality, and relevance to theoretical com- 
puter science. We wish to thank all authors who submitted extended abstracts 
for consideration, and all 366 referees who helped in the extensive evaluation 
process. The program committee for ICALP 2001 consisted of: 

TRACK A 

Maxime Crochemore (Marne-la-Vallee) Jose Rohm (Geneva) 

Leslie A. Goldberg (Warwick) Peter Sanders (Saarbriicken) 

Mordecai J. Colin (Hong Kong) Erik M. Schmidt (Aarhus) 

Juraj Hromkovic (Aachen) Maria Serna (Barcelona) 

Guiseppe F. Italiano (Rome) Jack Snoeyink (Chapell Hill) 

Viggo Kann (Stockholm) Athanasios K. Tsakalidis (Patras) 

Ludek Kucera (Prague) Jan van Leeuwen (Utrecht, chair) 

Bill McColl (Oxford and Sychron Inc.) Dorothea Wagner (Konstanz) 

David Peleg (Rehovot) 

TRACK B 

Samson Abramsky (Oxford) 

Kim B. Bruce (Williamstown) 

Stavros Cosmadakis (Patras) 

Hartmut Ehrig (Berlin) 

Javier Esparza (Miinchen) 

Thomas A. Henzinger (Berkeley) 

Jean-Pierre Jouannaud (Orsay) 

Jose Meseguer (SRI Menlo Park) 

The EATCS Best Paper Award was given to William Hesse (Amherst MA, also 
Best Student Paper) and to Parosh Aziz Abdulla, Luc Boasson, and Ahmed 
Bouajjani (Uppsala/Paris) for their respective papers in ICALP 2001. 

ICALP 2001 was a very special ICALP. Two other leading Computer Science 
conferences co- located with ICALP this time: the 13th Annual ACM Symposium 



Eugenio Moggi (Genova) 

Ugo Montanari (Pisa) 

Damian Niwinski (Warsaw) 
Fernando Orejas (Barcelona, chair) 
Catuscia Palamidessi (Penn State) 
Andreas Podelski (Saarbriicken) 
Hanne Riis Nielson (Lingby) 
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on Parallel Algorithms and Architectures (SPAA 2001), and the 33rd Annual 
ACM Symposium on Theory of Computing (STOC 2001), giving the joint parti- 
cipants an unprecedented opportunity in Europe to see the advances in a broad 
spectrum of foundational computer science research. It was the first time ever 
that STOC had been held outside of the USA. 

During STOC 2001 and ICALP 2001 the following special events took place: 
(a) the Turing Award Lecture by Andrew C-C. Yao (Princeton), (b) the EATCS 
Distinguished Service Award for Corrado Bdhm (Rome), (c) the Greek Computer 
Society /CTI Prize for Christos Papadimitriou (Berkeley) recognizing him as the 
most influential scientist of Greek origin for his contributions to the foundations 
of computer science, and (d) the Godel Prize 2001 awarded to S. Arora, U. 
Feige, S. Goldwasser, C. Lund, L. Lovasz, R. Motwani, S. Safra, M. Sudan, 
and M. Szegedy for their fundamental papers on PGP’s and the complexity of 
approximation problems. Special attention was given to the 80 th birthday of 
B.A. Trakhtenbrot (Tel Aviv). 

Several high-level workshops were held as ‘satellite events’ of ICALP 2001, 
with Christos Zaroliagis (Patras) as coordinator. This included the following 
workshops: Algorithmic Methods and Models for Optimization of Railways (AT- 
MOS 2001), Bohm’s Theorem: Applications to Computer Science Theory (BOTH 
2001), Graph Transformation and Visual Modeling Techniques (2nd Int. Works- 
hop, GT-VMT 2001), and Verification of Parameterized Systems (VEPAS 2001). 
The scientific program of ICALP 2001 showed that theoretical computer science 
is a vibrant field, deepening our insights into the foundations and futures of 
computing and system design in many modern application areas. 

The sponsors of ICALP 2001 included the Information Society DC of the 
European Commission, the Ministry of the Aegean, the Ministry of Culture, 
and the Ministry of Education and Religious Affairs of Greece, the Hellenic 
Pedagogical Institute, the Intracom Institute of Technology (HT), CTI (Patras), 
and the following companies: Intracom SA, Intralot SA, Intrasoft SA, ALTEC 
SA, MLS Multimedia SA, OPAP SA, Pliroforiki Technognosia Ltd, 01 Pliroforiki 
SA, Rainbow Computer SA, and Systema Informatics SA. 

We are very grateful to the Computer Technology Institute (CTI) of Pa- 
tras University for supporting the organization of ICALP 2001. The organizing 
committee consisted of: C. Bouras (CTI), S. Bozapalidis (Thessaloniki), R. Ef- 
stathiadou (CTI), C. Kaklamanis (CTI), M. Mavronicolas (Cyprus), C. Niko- 
laou (ICS-FORTH Crete), S. Nikoletseas (CTI), P. Spirakis (CTI), A. Tsakalidis 
(CTI), S. Zachos (Athens), and C. Zaroliagis (CTI). We thank Henk P. Penning 
and Henk van Lingen, system administrators at the Institute of Information and 
Computing Sciences of Utrecht University, for their outstanding effort in desi- 
gning and developing the CSCS electronic conference server for ICALP 2001. 
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Algorithms, Games, and the Internet 
(Extended Abstract) 



Christos H. Papadimitriou 

Computer Science Division, UC Berkeley, Berkeley, CA 94720, USA, 
chr istosScs . berkeley . edu, 
http : //www. cs .berkeley . edu/'christos 

Over the past fifty years, researchers in Theoretical Computer Science have 
sought and achieved a productive foundational understanding of the von Neu- 
mann computer and its software, employing the mathematical tools of Logic and 
Combinatorics. The next half century appears now much more confusing (half- 
centuries tend to look like that in the beginning). What computational artifact 
will be the object of the next great modeling adventure of our field? And what 
mathematical tools will be handy in this endeavor? 

The Internet has arguably surpassed the von Neumann computer as the most 
complex computational artifact (if you can call it that) of our time. Of all the 
formidable characteristics of the Internet (its size and growth, its almost spon- 
taneous emergence, its open architecture, its unprecedented availability and uni- 
versality as an information repository, etc.), I believe that the most novel and 
defining one is its socio-economic complexity: The Internet is unique among 
all computer systems in that it is built, operated, and used by a multitude of 
diverse economic interests, in varying relationships of collaboration and compe- 
tition with each other. This suggests that the mathematical tools and insights 
most appropriate for understanding the Internet may come from a fusion of algo- 
rithmic ideas with concepts and techniques from Mathematical Economics and 
Game Theory (see for two excellent introductions in the respective sub- 
jects, and see the web site www.cs.berkeley.edu/'^christos/cs294.html for many 
additional references of work in this interface.) 

This talk is a survey of some of the many important points of contact between 
Game Theory and Economic Theory, Theoretical GS, and the InternetOl 

Nash Equilibrium. Game theory was founded by von Neumann and Morgen- 
stern (in fact, about the same time von Neumann designed the EDVAG. . .) as 
a general theory of rational behavior. The Nash equilibrium (definition omit- 
ted here) is the predominant concept of rationality in Game Theory; it is also 
a most fundamental computational problem whose complexity is wide open: 
Is there a polynomial algorithm which, given a two-person game with a finite 
strategy space, computes a mixed Nash equilibrium? Together with factoring, 
the complexity of finding a Nash equilibrium is in my opinion the most important 
eoncrete open question on the boundary of P today. 

^ Invited talk presented at a joint session of ICALP 2001 and STOC 2001. The full 
version of this paper appeared in the Proceedings of STOC 2001. Research supported 
by the National Science Foundation of the U.S.A. 

F. Orejas, P.G. Spirakis, and J. van Leeuwen (Eds.): ICALP 2001, LNCS 2076, pp. 1-^ 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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In relation to the Internet, it would be most interesting to develop a game- 
theoretic model of Internet congestion control from which (an appropriate ap- 
proximation of) TCP/IP emerges as a Nash equilibrium. 

Coalitional Games. A coalitional game with n players is an increasing function 
u : I— >■ that is, a specification of the amount that each coalition of players 

“deserves.” The fundamental issue in coalitional games is deciding whether a 
proposed allocation to the players, a vector x € 5R” with x[n] = r'([n]), is a 
“fair” way for the n players to split the loot in «([«]). A chief notion of fairness 
(among many others) is the core: A vector x S 5R" with x[[n]] = r’([n]) is in the 
core if a;[5'] > u([5']) for all S. 

We can model the high-level operation of the Internet (the interaction of 
the “autonomous systems” that run it) as a coalitional game, as follows: We are 
given a graph with n nodes (the autonomous systems); an nx n symmetric traffic 
matrix F, where is the total traffic requirements between customers of i and 
customers of j; and a capacity Ci for each node (a simplification attempting to 
capture the capacity of i's subnetwork to carry traffic). If S' is a set of nodes, 
consider the subgraph induced by S as a multicommodity network with node 
capacities and commodity requirements given by the entries of F\ let v{S) be 
the maximum total flow in this network — notice that this defines a coalitional 
game. 

The key problem here is this: Is there an optimum solution in the multicom- 
modity flow problem for the overall network, achieving a flow matrix F' < F, 
such that the corresponding payoffs for the nodes Xi = /b are in the core of 
the coalitional game v (or abide by one of the other notions of fairness mentioned 
above). 

The Price of Anarchy. There is no central authority that designs, engineers 
and runs the Internet But what if there were such master puppeteer, a benevo- 
lent Internet dictator who, for example, micromanaged its operation, allocating 
bandwidth to flows so as to maximize total user satisfaction? How much better 
would the Internet run? What is the price of anarchy? 

This question was posed (and partially answered in the restricted context 
of a network consisting of two nodes and parallel edges) in [2j. More recently, 
0 showed that, in the context of a general multicommodity flow network in 
which message delays increase with edge congestion while flows choose paths so 
as to minimize delay, the price of anarchy is two (more precisely, the anarchistic 
solution is no worse than the optimum solution with double the bandwidth). 

But, of course, in today’s Internet, flows cannot choose shortest paths. In the 
Internet, routers direct traffic based on local information, users respond to delay 
patterns by modifying their traffic, and network providers throw bandwidth at 
the resulting hot spots. How does this compare in efficiency with an ideal, ab 
initio optimum network design? What is the price of the Internet architecture? 

^ Recall David Clark’s famous maxim: “We reject kings, presidents and voting. We 
believe in rough consensus and rnnning code.” 
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Mechanism Design. If Game Theory strives to understand rational behavior in 
competitive situations, the goal of Mechanism Design (an important and elegant 
research tradition, very extensive in both scope and accomplishment, and one 
that could alternatively be called “inverse game theory”) is even grander: Given 
desired goals (such as to maximize a society’s total welfare), design a game 
(strategies and payoffs) in such a clever way that individual players, motivated 
solely by self-interest, end up achieving the designer’s goals. There have been 
recently interesting interactions between this fascinating area and Theoretical 
GS, see e.g. and further opportunities abound. 

Rough Markets. The famous Arrow-Debreu Theorem states that, under rea- 
sonable conditions, in any market there is always a set of prices that clears the 
market (agents optimizing their basket end up buying a total amount of each 
good which exactly equals the sum of the endowments that each agent brought 
in the market). But if the goods are integer- valued, then such an equlibrium may 
not exist. In recent joint work with Deng Xiaotie, we prove that it is NP-hard to 
tell if a price equilibrium exists even for very simple discrete markets; however, 
a price vector that clears markets approximately on the average (definition omit- 
ted) does exist and can be found in polynomial time — if the number of goods 
is bounded. 

Finally, three more areas of contact between Theoretical Gomputer Science and 
Economics are discussed in the full paper: Economic aspects of privacy (algo- 
rithmic problems involved in computing the fair royalty for private information 
in various contexts), of clustering (how economic considerations can be a guide 
in the chaos of clustering criteria), and of the web graph (can the world- wide 
web’s economic aspects explain its peculiar structure as a graph?). 
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1 Introduction 



Classical Automata Theory (AT) is mainly about devices that operate in dis- 
crete time. Recent research stimulated the interest to, and the development of, 
paradigms in which continuous time is involved whether in a pure way or in 
cooperation with discrete time. This development is in particular evident in the 
area that covers the following three interrelated trends: automata, logic (arguing 
about automata) and interaction (composition of automata). 

Unfortunately, the area is dominated by a plethora of concepts, terminology 
and notation, which is not free of ad-hoc and ambiguous decisions that are liable 
to misjudgments. A wrong formal decision can be misleading in long term even if 
it turned out to be instrumental in concrete specific cases. It engenders further 
models and formalisms, and it is not clear where to stop. Hence (quoting J. 
Hartmanis) the challenge “to isolate the right concepts, to formulate the right 
models, and to discard many others, that do not capture the reality we want 
to understand...” A comprehensive undertaking of this challenge, covering the 
three trends mentioned above is a large-scale task-work. Here we confine only to 
some continuous-time paradigms that are suggested by the current literature on 
hybrid automata and related control problems. In this literature are often used 
cumbersome wordings to define ’’from scratch” intricate notions. For example, 
here is a (slightly edited) extraction from the definition of the core notion “hybrid 
automaton” ([VdSS]): 

“A hybrid automaton HA is described by a septuple (T, X, A, W, E, Inv, Act) 
where the symbols have the following meanings: 

T is a finite set ... (of) locations, 

X is... 

A is... 

W is... 

T is a finite set of edges called transitions (or events) . Every edge is defined 
by a five-tuple (I, a, Guardjji, Jumpu>, I'), where... 

Inv is... 



F. Orejas, P.G. Spirakis, and J. van Leeuwen (Eds.): ICALP 2001, LNCS 2076, pp. 4-^^ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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Act is a mapping that associates to each location I G L a set of differential- 
algebraic equations F/... (Their) solutions are called the activities of the loca- 
tion.” 

Before attacking such intricate models and tackling the related control is- 
sues, it is important to reach a clear understanding and, ultimately, to provide 
definitions for some basic proviso. Namely, 

(i) What are continuous-time (for short, continuous) automata and what is 
their purposeful classification? 

(ii) What are the relevant models of interaction and control for continuous 
automata, and how do they relate to the original counterparts in classical au- 
tomata theory? 

After this is agreed upon, the next question to be addressed is: 

(iii) What are (and what should be) hybrid systems (in particular, “hybrid 
automata”)? 

We believe that first, the agenda should build on mainstream, basic Automata 
Theory and related, well-understood tools. Hence, on the conceptual level there is 
no reason to involve abstract algebra and/or calculus, say differential equations. 
For example, instead of the specific activities (we call them also flows) and jumps 
that occur in the definition above, one would rely on the following 

Definition 1. 1. A jump j on state-space X is a function of type X — > X; it 

is total if it is defined everywhere in X. 

2. A flow f on state-space X is a function f : X x i?-° — ^ X , that meets the 
following conditions: 

(i) f{x,0) = x; , and (additivity): 

f{x, ti) = X hf{x , t 2 ) = x" — ;> f{x, ti -\- 12 ) = x", 

(ii) if f{x,t) is defined, so is f{x,t') for each 0 < t' < t , 

(iii) the flow is total if f(x, t) is defined for all x G X,t G otherwise 

it is partial, ^^nil” is the notation for the (polymorphic) trivial flow: 
\/t[nil{x, t) = x]. 

Suppose that the space X is actually the Euclidean i?". Then, some flows could 
be specified as solutions of appropriate differential equations, and some jumps, 
as linear transformations. Clearly, information of this kind may be crucial for 
those who, for example, compute reachable states of specific continuous systems. 
Nevertheless, this information is not part of the basic concepts we pursue. These 
are driven to some degree by mathematical curiosity: how can one lift classical 
automata theory to deal with continuous time or ascertain that this is impossible. 
As to potential applications, we content (following an apt expression of C. Rota) 
with “hygienic prescriptions meant to guard us against potentially unpleasant 
complications”. Hopefully, this will become clearer in the concluding section 8 
(Discussion). 

A major goal is to highlight the distinctions and the similarities between the 
discrete and continuous tracks. Hence, the policy of building up definitional 
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suggestions in an incremental/orthogonal style; this allows to gradually estimate 
the effect of introducing continuous time. 

Eventually, in spite of similar terminology, a continuous-time entity is differ- 
ent from (and more involved than) the corresponding one in discrete time. For 
example: 

(i) In discrete time, the execution of a transition is considered as an instan- 
taneous event; for continuous time the duration aspects become relevant. 

(ii) The unit delay is a finite automaton in the discrete case, whereas contin- 
uous time forces the delay to memorize an uncountable amount of information. 

On the other hand, some phenomena may look as being characteristic for 
continuous time, merely because their discrete-time counterparts were not ex- 
plicitly identified previously. Hence, the need to spell out accurately the related 
issues and to highlight their universal character. 

Below is a preliminary outline of what is to come. For many reasons (in 
particular, because the lack of precise and elegant terminological standards) we 
are forced to an eclectic, informal style. The numerous and somehow evasive 
“Observations”, spread over the whole exposition, are intended mainly to help 
comparing the discrete and continuous tracks. 

We start with a brief (and not fully conventional) presentation of some dis- 
crete stuff, whose continuous-time analogs and/or mutants we would like to 
understand. Further we build on a generic definition of the automaton concept 
via a system of axioms Ax borrowed from the Control-Theory monograph [SI]. 
Actually, this system is conceived for both discrete and continuous automata, 
depending on whether the underlying time-domain TIME is the ordered set 
Nat of natural numbers or the real line R-^ . 

Continuous automata. Having chosen the real line, one can move toward more 
specific continuous automata via expedient concretization of Ax and keeping to 
the following principles: 

(i) Consistency with Ax; 

(ii) “Continuity” of automata refers only to their time-domain, i.e. to the 
real line, equipped with its standard order, metrics and algebraic structure. As 
to the input, output and state-spaces (alphabets), whether finite or infinite, they 
are handled as amorphous spaces with discrete topology. 

With these reservations, we mention immediately some germane features of 
continuous automata that are not visible, or have different effects in discrete 
time. Yet, none of them (nondeterminism, completeness and duration indepen- 
dence) is considered explicitly in the literature. 

On the other hand, on the intuitive level there is a consensus in the com- 
munity about restrictions to be imposed on signals manipulated by continuous 
automata. Namely, 

(i) “Realistic” input signals, whether with finite or with infinite duration, 
should enjoy the property known as finite variability (FV), i.e avoidance of 
Zeno-anomaly. 
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(ii) Under the influence of such input signals, a continuous automaton should 
either evolve continuously for some duration of time (a flow phase), or should 
be abruptly reset to a new value (a jump hit) from which subsequent continuous 
evolution occurs. 

At a lower degree of consensus is the following feature: 

(iii) (Flow robustness) The behavior of the automaton is not influenced by 
“sparse” deviations which do not affect the jump hits. 

We aim at a formalization of these intuitions in a way that is consistent with 

Ax. 

Input /output. So far, about outputless automata; we call them simply “au- 
tomata” . The term “transducer” is preserved for an automaton M equipped 
with a readout 'F : X x U — Y, where F is a finite (!) output-space. As in 
the discrete case, a transducer T computes (implements) an input/output op- 
erator F which is retrospective, i.e. the output-value at time t depends only 
on the values inputted not later than t. If the readout depends only on the 
current state of M it is called measurement. In this case, the i/o-operator F 
is strongly retrospective, i.e. the output at time t depends only on the values 
inputted before t. For a given retrospective operator F there may be different 
implementations. In most of the issues concerning circuits and/or control the 
essence is in F and not in its concrete implementation. This remark refers to 
both discrete and continuous time. 

In discrete time, strong retrospection and retrospection reflect sharply differ- 
ent causal dependencies of output from input. Under appropriate (Baire) met- 
rics, retrospection of an operator F implies its continuity. On the other hand, 
strong retrospection implies the Lipshits property i.e. implies that A is a con- 
tracting map. This paves the way to using strongly retrospective operators in 
fixed point techniques needed to tackle feedback in circuits and in control. Yet, 
in continuous time causality is more subtle and is not exhausted by the dilemma 
“retrospection vs. strong retrospection”; hence, the need to define and analyze 
more subtle properties of retrospective operators: predictivity, weak delaying, 
bounded output variability etc. 

Nondeterminism. In both discrete and continuous tracks it is handled in the 3- 
setup. Namely, a nondeterministic object Ob (automaton, transducer, operator) 
is represented (implemented) by a deterministic object Ob' with appropriately 
hidden (shady) input. 

Interaction. A comprehensive taxonomy of interaction relies on the following 
dichotomies ([T4]): 

1. Synchrony versus asynchrony. 

2. Disjoint memory versus shared memory. 

Here, we are interested mainly in synchrony of automata (respectively, trans- 
ducers) with disjoint memory (“disjoint state-spaces”). 

The crucial point about systems of interacting transducers is that, unlike sys- 
tems of interacting automata, they are subjected to a specific feedback discipline] 
we call such systems in brief - circuits. Originally, they were identified as faithful. 
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idealized models of digital (!) computer circuits. This is the hardware view on 
circuits. In [BW], Burks and Wright coined the name Logical Net for circuits 
composed of logical gates and delays working in discrete time. Circuits should 
be feedback-reliable, i.e. intuitively, the propagation of signals along closed cy- 
cles should be causally faithful. In continuous time, feedback-reliability is more 
subtle because the (non existing in discrete time) danger of Zeno-anomaly. 

A control view on circuits. Here the focus is on the interaction between 
two main components, the plant and the controller. In discrete time, synthesis 
of control circuits amounts to finding appropriate controllers for given plants. 
In continuous time, synthesis includes also the design of a third component, 
namely the interface. As mentioned earlier, there is no commitment to specific 
techniques from differential equations and/or numerical analysis; however, one 
still has to face non trivial feature of the circuit’s components. 



2 Discrete Time 

From now on, A, U, Y are typical notations for state, input and output alphabets 
(spaces); U and Y are finite. Typically, u is an element of C/, m a path with 
values in U and, finally, U the set of such paths. Similarly for other alphabets. A 
deterministic discrete automaton M is usually identified with a map nextstate : 
X xU — > X. When applied to a state x and an input sequence u = u{l)...u{n), 
the automaton returns a state-sequence x of length n -I- 1. Namely, 

i(l) = x; x{i -I- 1) = nextstate{x{i) ,u{i)) 

Correspondingly, with nextstate are associated: 

1. The terminal transition map W : X x U — > X which returns the value 
x(n + 1); 

2. The full transition map W : X x U — > X which returns the sequence 
i(l)...i(n). Assume that an initial state is fixed. Then, induces a strongly 
retrospective input/state (i/s) behavior. 

Observation 1. Nextstate, terminal and full transition maps can be uniquely 
restored from each other. Full transition makes sense for infinite input sequences. 

Nondeterminism. The terminal and full transition-maps '?'d of a determin- 
istic automaton D represent the terminal and full transition-relations Rd, Rd 
of a 3-automaton. Assume that the input-alphabet of D is U x V, and, corre- 
spondingly, its input paths are < u,v >. Then, the terminal transition-relation 
R](, and the full transition-relation R]) which are implemented by D via the 
hiding (projecting out) of V, are as follows: 

i?]/(x, u, x') iff 3 u[iF£i(x, < u,v >) = x'] 

R]){x, u, x) iff 3v[(^d{x, < u,v >) = x] 
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Since the 3-implementation is not unique the following observation about the 
implementation is instructive: 

Observation 2. (Robustness of 3-implementation) If i?)) = R'^, then R]^ = R^), 

Circuits. The analysis in depth of feedback and causality issues is an essential 
aspect of circuit theory. Most regrettably, this point is often missing in the 
literature. We remind below some details ([BW], [KT]: 

At the most appropriate level of abstraction, a circuit C oi k interacting 
transducers Ti communicating through their inputs and outputs, is specified 
by a recursive system Eq of k functional equations Eqi with the format Xi = 
/i(yi, ..., ym)', * = 1> k. Each equation describes one of C’s components, and ft 
is a retrospective operator, namely the i/o-behavior of the transducer T^. All the 
variables occurring in Eq are typed; those which occur only on the right hand side 
of an equation (say vi,...,Vm) are declared as input variables of Eq, the others 
(say wi, ..., Wn) - as its output variables. The conjunction /\ Eqi defines a (m+n)- 
dimensional relation R{vi, ■■■,Wn) which, by chance, may happen to 

be the graph of a total operator < wi, ...,Wn >= E{vi, Say that F is a 

(functional) solution of Eq. For each fi, let // denotes its ^-truncation, i.e. the 
restriction to paths with duration < <5; respectively, consider the ^-truncations 
Eq^ of the system Eq and their solutions. Only “good” solutions of Eq (if they 
exist) are relevant for the input/output discipline. 

Definition 2. The system Eq is well behaved if it has a unique i/o-solution E 
with infinite duration; yet, in general, F is not necessarily retrospective. Eq is 
sound if F is retrospective, and, moreover: for each finite 5, the truncated system 
Eq^ has an unique 5-restricted solution, which is also retrospective. A circuit is 
sound ( synonymously, feedback-reliable ) if the corresponding system Eq is sound. 

Observation 3. The system Eq is sound if it satisfies the conditions: 

(i) no two equations have the same left side, 

(ii) each cycle in Eq passes through a strongly retrospective operator. 

Circuit C* with nondeterministic components T*. Syntactically, instead 
of equations Eqi with format “variable = term”, appear inclusions with for- 
mat “variable G term”. Actually,, according to the 3-approach, C* is handled as 
a circuit C in which the T* are replaced by corresponding deterministic imple- 
mentations Tp, it is required that the shady input of Ti does not appear in any 
other equation. Hence, in C, beyond the input variables inherited from C*, all 
the shady variables have also the status of input variables. What it remains to 
do, is to check the soundness of C and to hide the relevant shady inputs. 
Observation 4. (Robustness). The semantics of C* does not depend on the 
choice of the implementations R for the given T*. 

Constraining and Control. Given an automaton M one may be interested 
only in those “good” trajectories or input-paths that meet some requirement 
T . Let us look to safety requirements. A natural way to their formalization is 
via an appropriate enabling relation E, where E(x, a) means “ in the state x 
of M is allowed (not prohibited) the input a. The relation E may be identified 
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with the function menuE{x) which associates with x the finite set of those a 
that are enabled at x. Alternatively, one may use admissibility regions; namely, 
admE{ui) is the set of states in which the given Oi is enabled. Correspondingly, 
a trajectory < u, i > of M is if-safe iff it does not violate E at any time-instant 
t; in this case, the input-path u is also said to be E-safe. Saying it otherwise, 
the pair < M, E > defines a subautomaton M' of M, whose trajectories are the 
A-safe trajectories of M. 

Yet another universal way to constrain an automaton M is to consider the 
synchronization N = M x D with a companion automaton (“inhibitor) D . 

A combination of both ideas is via synchronized enabling. It is assumed that 
M, D have the same input-alphabet U and disjoint state-spaces X, S; on the 
other side, the enabling E is chosen for N and, hence, induces admissibility 
regions in the state-space X x D. The projection of < N,E > into X is the 
intended constraining of M . 

According to another popular constraining mechanism (by simple guards), 
the admission or prohibition of a transition at the current state may depend 
also on the value inputted at the preceding time-instant. This means that in 
addition to E one considers also a guard relation G. Namely, G{x, Ut, Uj) means: 
“uj is not prohibited at state x if it was preceded by Ui. Hence also the use of 
guard regions g{ui,Uj) in addition to admissibility regions adm{ui). 
Observation 5. Synchronized enabling is more powerful than enabling -|- simple 
guards. In particular, finite-state inhibitors cover the effect of “regular” guards, 
which take into account more than the immediately preceding input values. 

The constraining of a transducer P reflects rather the control view on cir- 
cuits. Here, the constraining of P is via synchronization with an appropriate 
companion-transducer (“controller”) Con. It is required also that P together 
with Con make up a feedback reliable circuit C. In view of the specific, simple 
structure of the intended C, feedback-reliability issues are simpler than in the 
general “hardware” perception. The controller problem is understood as follows: 
given a plant P and requirement P, find (synthesize) the appropriate controller 
Con or ascertain that this is impossible. 

Observation 6. The celebrated synthesis problem in the theory of finite au- 
tomata and its solution by Biichi can be reformulated in these terms. Note that 
it does not refer specifically to safe requirements; moreover, it can manage with 
circuits that are acyclic and, hence, are trivially feedback-reliable. However, the 
use of suitable cycles may be justified by additional performance criteria. 



3 The System Ax: First Refinements 

In continuous time, the paths manipulated by M are signals i.e. functions from 
intervals [t, t-\-S) C R-'^ (here, S may happen to be oo) into the appropriate space 
(alphabet). Hence, for continuous M, nextstate-map does not make sense, and 
it is natural to start directly with a terminal transition map E : X x U ^ X. 

Note that non-empty input-signals have some positive duration 6. The in- 
tended semantics of a terminal transition is: if state x occurs at time t, then u 
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with life-time [t, t -I- (5) produces the state x' at time t + 5. The empty signal e is 
sometimes used for the sake of algebraic aesthetics. Below, ui,U 2 & U are finite 
paths, Ml • M 2 designates their concatenation. Here is the system Ax: 

- (Axl) Semi-group. 

W{x,e) = x; [<f'(x,Mi) = x' h W{x' ,112) = x”] — >■ M1.M2) = x" 

- (Ax2) Restriction (Density). Assume that 'F{x,u) = x" . If u is the concate- 
nation Ml • M 2 then there exists x' such that 'F{x,ui) = x' ^{x',U 2 ) = x" . 

- (Ax3) Non- triviality. For each state x € X there is a nonempty input-path 
admissible at x. 

Terminological/notational remarks. Wrt a signal z with values in some Z, 
continuity, right-continuity, right limit etc. refer to the discrete topology. For 
example, z{t -I- 0) = a would mean that z is defined and has the constant value 
a in some interval {t,t -\- <5). Say that an ordered set of time-instances is sparse 
iff it is finite or an increasing sequence t\ < t 2 < ... < ti < ... with ti — > 00 . 
“Almost” (wrt time) means “except a sparse set of time instants” . 

Observation 7. The following two definitions are germane for continuous au- 
tomata but do not make sense for discrete automata. 

Definition 3. The automaton M is complete iff it satisfies the following condi- 
tion; Consider a state x G X and an input path u; If arbitrary proper prefix u' 
of u is admissible at x, i.e. T{x,u') is defined, so is u. 

It is important to distinguish metrical aspects of theory, which deal with the 
distances between time-instants, from duration-independent aspects that reflect 
only the order of time instances. 

A stretching of the time-axis R-^ is an arbitrary 1-1 monotonic mapping p : 
i?-° — >■ i?-°. A path w is the p— stretching of the path v if 

'it{w{t) = v{p{f)). 



Definition 4. The automaton M is duration-independent if each stretching p 
of an input path causes the p— stretching of the corresponding state-path. 

3-nondeterminism. The formal definitions of the transition-relations are ex- 
actly as in the discrete case. However, in continuous time it may happen that two 
deterministic automata D, D' implement the same terminal transition-relation 
but different full transition-relations. Hence, 

Observation 8. The robustness mentioned earlier for discrete-time automata 
fails, in general, for continuous automata. The conclusion: build on full 
transition-maps . 

Note that some popular versions of nondeterminism, in particular that of 
nondeterministic delay ([MPl]), can be easily reformulated in {desugared to) the 
3-track, even though the usual definitions do not appear so. 
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4 Realistic Features of Signals 

Say that z is an elementary path with duration S € U{°o} some 

a,b £ Z there holds: z(0) = a; z{t) = 6 for 0 < t < <5. The corresponding 
notation is z =< a • b,6 >. Finite variability (FV) of a path u means that it 
can be presented as a concatenation of elementary path-components; moreover, 
the number of these components is finite (respectively, infinite) if the duration 
of u is finite (respectively, infinite). M is a FV^-automaton iff all its input-paths 
have FV. Hence, up to semigroup closure, for FH-automata, it suffices to define 
the terminal transition function F only for elementary input-paths. The self- 
explanatory notation is F{x,u • u' ,5). 

In order to formalize the intuitive expectation about flows and jumps we 
assume first that: 

1. The input alphabet U is the disjoint sum of U'^ (the jump-sub-alphabet with 
generic elements ji) and (the flow-sub-alphabet with generic elements 

fr). 

2. On each input-path u, the set of jump-instances is sparse. 

Since we consider FF-automata, we have only to characterize elementary tran- 
sitions of two kinds: 

'F{x, fi • fs, 5) {pure flow) (*) 

'F{x,ji • fs,S) {flow after jump) (**) 

This is done (and checked to be consistent with Ax) as follows: 

3. With each / G is associated a flow ||/|| and, independently of fi (!) 

^{x,ft»fs,S) =def ||/s||(a;,(j) 



4. With each j is associated a jump ||j|| and 



<F{x,j» f,S) =def f{\\j\\{x),6) 



Say that M is a jump/fiow (}'//j-automaton if it can be specified as above. M is 
a flow automaton if is empty. If all the paths of M are right continuous it is 
said to be a right-continuous automaton; clearly, this implies that M is a flow 
automaton. M is a jump automaton if its only flow is nil. 

Two input-paths of M are flow-similar if the set of time-instants where they 
differ is sparse, and at each such instant both have flow- values. 

Observation 9. (Flow robustness) A jump/flow-automaton M does not distin- 
guish input-paths which are flow-similar. 

Duration Independence under Finite Variability. In this case an automa- 
ton is duration independent if whenever F{q\^a • b,S) = <72 holds for some 
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duration S, it holds for arbitrary S. Hence, the S argument may be omitted. 
Clearly, a jump automaton is duration independent. The following is not trivial: 



Proposition 1. ([R]) Every FV -automaton M with finite state-space is dura- 
tion independent. 

It is worth noting that, even though this result is about T’R-automata, the proof 
in [R] leaves the world of finite variability and deals with arbitrary paths. 

Example (trivial). Modeling a finite-state discrete automaton M with a 
duration-independent automaton (actually with a jump automaton M"^). 

Consider M with state-space X, input-alphabet A = {ai, ...,ak} and 
transition-map nextstatCM- First, extend M to M' with input alphabet A' =def 
A\}{nir\ and with additional (to nextstatcM) transitions next state m'{x, nil) = 
X. Now, the state and input-spaces of M'' are the same as for M' . Its elementary 
transitions mimic those of M' in the most natural way: for each u G A' 

{x,u • nil) = nextstateM'{x,u) (1) 

We confine to this trivial example and mention, in a somehow vague form, a 
general fact to be used in the sequel (it holds also for transducers). 
Observation 10. In a strong sense, duration-independent automata and trans- 
ducers model (and a modeled by) discrete automata and transducers. 



5 Circuits 

Observation 11. The soundness criterion for discrete-time (see Observation 3) 
fails for continuous time. The reason: strong retrospection does not support the 
needed fixed-point techniques. 

There are sufficient conditions that use additional properties of retrospective 
operators like: predictivity, bounded variability of output signals etc. ([PRT, 
Tl]). Respectively, there are also some specific “anti-anomaly” continuous-time 
primitives that differ from delays, but mimic somehow their behavior. Circuits 
that make use of such primitives can be considered to some degree as models of 
continuous-time hardware ([PRT]). 

Observation 12. In continuous time, circuits over 3-nondeterministic compo- 
nents, can be handled according to the discrete-time scenario modulo two pre- 
cautions. First, see Observation 11 and the accompanying comments. Second, 
provide faithful 3-implementations for the nondeterministic components. 



6 Constraining Continuous Automata 

This is a straightforward adaptation of the discrete-time pattern (section 1). 
Namely, the format of the enabling E is E{x,a»b), understood as “in the state 
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X of M are allowed (not prohibited) elementary input-paths < x,a»b,6 >”. Fur- 
ther, one can consider the associated menuE{x), admissibility regions admE{a* 
b), FI-safe trajectories and Fl-safe subautomaton M' of M. 

The definition of synchronized enabling is preserved, but note that the in- 
hibitor D is assumed to be a finite-state, hence duration-independent automaton 
(see Observation 10). 

In order to handle guards, one considers, in addition to the enabling relation 
E, also a guard-relation G, with format G(x, Ui, a»b). Here, Ui is (in a very precise 
sense) the flow that precedes a potential elementary input path < a» b,S >. 
Correspondingly, are considered guard-regions g{ui,a • b). Finally, 
Observation 13. With the reservations above, the former Observation 5 is valid 
for continuous time as well. 



7 Continuous-Time Control 

In continuous time, a control circuit C may contain, in addition to the Plant 
P and Controller Con, also an interface I. Since P,Con,I are not subjected to 
further decomposition, feedback reliability issues are simpler than for hardware- 
oriented circuits. 

We confine below to control circuits with deterministic (!) plants, that mimic 
Sample- and- Hold (SH) architectures from Control Theory. Namely, the values y 
outputted by the plant P are available to Con only at some sampling instants 
to = 0 < < ^2 < ■••• The controller is updated at each C and, on the basis of 

the information available at ti, it computes the output for the life-time [ti,ti-^.l). 
In Control Theory, it is often assumed that |C+i — ti\ is constant; in such cases 
the similarity with discrete time is quite evident. However, in general one has to 
build on appropriately scheduled sampling-instants. 

A simple kind of control is selection. Assume a total P and some provisions 
T imposed on its input paths. The task is to find appropriate deterministic Con 
and / such that the circuit C over < P,Con,I > restricts P to a unique (!) 
input-path which meets T , or ascertain that this is impossible. We are going to 
illustrate this wrt safe selection. In this case it is assumed that the plant P is 
represented explicitly as an automaton M equipped with a measurement h, and 
that the requirement T is formulated in terms of an enabling relation E for M, 
exactly as in the previous sections 1 and 5. Correspondingly are understood the 
notions P-safe trajectories, and “subautomaton M' of M” defined by the pair 
<M,E> 

Intuitively, a positive solution presumes that the current output of P can 
serve as a basis for feedback control, i.e. it is enough informative and it grants 
to the potential controller the chance to properly react in real time. Hence, the 
assumption: 

h{x) = menuEX {h) 

Under this condition, the control circuit is required to select a unique trajectory 
of M which is P-safe. Actually, the trajectory will have the additional property 
of persistence. Namely, no change of the current flow occurs as long as it is 
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enabled. Clearly, this means that the circuit C should possess some kind of 
“persistence ability”; namely it should be able to properly register and react to 
border-crossings of admissibility regions. 

An appropriate Sample-and-Hold architecture is depicted in the lower part 
of Fig. 1. Here, I is the interface, whereas P is represented via M and h. 

The controller Con is a finite-state (hence, a duration-independent) trans- 
ducer with input-alphabet V = Y\J{nil} and output-alphabet U. The cardinal- 
ity of the state-space Q may vary depending on the intended selection strategy. 
Con inputs values G Y at the sampling instants and inputs the value nil other- 
wise. 

Observation 14. Assume that the subautomaton M' satisfies two conditions: 
finite variability and completeness. Then the circuit selects an input-path u with 
finite- variability which is both safe and persistent. 

Here are some details that characterize the persistence ability of the circuit and 
the importance of the completeness ( ! ) assumption about M' . It turns out 
that the interface / is responsible for the appropriate time-management of the 
sampling instants. As shown in the middle part of Fig.l, it consists of three 
components: (Note the error in the picture: the dashed line directed to 3 should 
be directed to 2). 

Component 1. This is the transducer Sieve with input y and, in general, with 
two more “ticking” inputs etick and btick. In the deterministic architecture one 
manages only with btick and 

v{t) = y{t) iff btick{t) = tick ; otherwise v{t) = nil 

At isolated time-instants Sieve outputs to Con the values GY to be sampled. 

Component 3 implements a strongly retrospective operator; namely, u' {t) = 
u(t — 0) 

Component 2. This is the Boundary Detector B which is at the heart of the 
“persistence ability” . The role of B is to detect the next time-instant t (if any) 
when the current flow Ui is not enabled anymore, i.e. when Ui ^ menuE{x(t)); 
in other words, at t is hit the border of the region adm{ui). According to the 
expected persistence ability, only at such instant may occur a fresh sampling. B 
implements the operator: 

btick{t) = tick iff u {t) ^ {y{t)); otherwise = nil 

(Remind that y{t) = h{x{t)) = menu{x{t)). Hence, unlike the other two com- 
ponents, the Boundary Detector B is tailored especially to the concrete plant, 
notably to menuE{x), which is just the output y. See that all this works OK 
due to the completeness assumption! 

8 Hybrids 

According to Webster’s Dictionary a hybrid is “something heterogeneous (con- 
sisting of dissimilar ingredients or constituents) in origin or composition” . 
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To be more precise we expect that: 

(i) The heterogeneous constituents are two agents: the continuous one L and 
the discrete (may be even, finite) one D. 

(ii) In order to constitute a (hybrid) system, the two should be subjected to 
an appropriate composition. 

As suggested by our previous exposition it seems that: 

a) L should be, in general, a jump/flow automaton or a transducer with an 
underlying jump/flow automaton; sometimes one may confine to flow-automata. 

b) One should give up the tempting option to draw in discrete automata. 
Instead, in accordance with Observation 10, one can use their faithful duration 
independent models. 

c) For automata, the main composition rule is synchronization with disjoint 
memory. For transducers, because the relevance of feedback, it is natural to 
use circuits. Actually, one more synchronization-like composition (but with fully 
shared memory) seems to be appropriate. We call it “blend”; the definition 
appears a bit later. 

Proceeding from the assumptions above, we mention now three kinds of hy- 
brid systems, to which we refer as HI, H2, H3. Note that the first two (HI 
and H2) refer to automata (hence, no involvement of input/output), whereas in 
H3 the concern is about transducers (hence, also, the relevance of circuits and 
feedback). On the other hand, H2 and H3 have natural counterparts in discrete 
time, whereas HI is based exclusively on continuous-time peculiarities. Here are 
all the three: 

H3 is a feedback-reliable control circuit. 

H2 (see the middle of Fig. 2) is the synchronized enabling (discrete version 
in sections 1, the continuous one in section 5). There is an extensive literature 
dedicated to infinite-state generators of “safe” trajectories. Most of them con- 
verge to the so called “Hybrid Automata” {HA), alleged to be a fundamental 
paradigm which combines continuous and discrete dynamics. This model is, in 
fact, a particular, disguised case of the H2-concept; unfortunately, the underly- 
ing composition is not spelled out explicitly in ([ACHP, ABD*, VdSS]). Here 
are some details: 

Picture % in Fig. 2 displays a widely accepted way to explain “Hybrid- 
Automata” as specific transition systems with labeling for edges (here, gij) and 
vertices {adrrii and fi). The operational semantics characterizes the correspond- 
ing safe trajectories, as runs which respect the “invariants” adrrii and the guards 
Qij. The activities fi are usually specified by differential equations. 

The corresponding H2-model is partially explained in the respective part of 
Fig. 2. Here, the inhibitor D is a delay with inputs Ui,U 2 , (the same as of M) 
and states Si, S 2 . On the other side, M is a flow automaton with flows borrowed 
from H. Finally, the admissibility regions of N are as follows: 



{si,x) € adm]\[{uj) =def x G g{ui,Uj) A x G adm{uj) 
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HI is the composition (tentatively called here “blend”) which applies to a 
pair < Ml, M 2 >, where Mi is a jump-automaton and M 2 is a right-continuous 
automaton (a particular case of flow automaton). The operation produces a 
jump/flow automaton M; all the three have the same state-space X. Intuitively, 
M inputs in parallel a jump-path of Mi and a right-continuous path of M 2 , as 
suggested by part HI in Fig. 2, and performs their mixture. Let Ui,U 2 ,U he the 
corresponding input alphabets. Here Ui =def {ji, ■■■, jm',nil} and U 2 =def 
{/i,.../r, Then, U =def Ui x U 2 , i.e it consists of pairs (ji,fr) G 

(these are the jumps of M) and of pairs {nil, fr) G (these are the flows of 
M). 



Definition 5. ('f— composition ) 

• A path z is admissible in M iff its projections are admissible in the respective 
components. Hence, 

• Elementary input-paths of M have (up to duration) one of the formats: 

a =def {ji, fs) • {nil, fs); (3 =def {nil, fs) • {nil, /«) (2) 

• Correspondingly, the transition map 'Em is defined as follows: 

EM{x,a,6) = fs[ji{x),S)]; Em{x, (3,5) = fs{x,5)) (3) 



9 Discussion 

About the Sample-and-Hold architecture. Section 6 can be generalized in 
two orthogonal directions: 

(i) Instead of selection consider uniformization. This means that beyond the 
plant’s input u fed by the controller, the plant has an additional input e fed by the 
environment. On the other hand, the controller is required to guarantee correct 
selections in the face of all the behaviors of the environment. This generalization 
can be handled via a routine adaptation of selection. 

(ii) Instead of deterministic selection consider nondeterministic ones. This 
needs a recasting of the deterministic architecture (section 6), in which all three 
components were deterministic. In a nondeterministic architecture, the plant is 
still deterministic; the 3-nondeterminism is incarnate in the controller Con (see 
the shady input tt in the lower Fig, 1) and in the interface / (the shady input 
etick). There is a clear difference between: 

1. Nondeterministic detection of sampling-instants. In addition to the internal 
mechanism of boundary detection (which is deterministic and relies on the 
completeness assumption) here are relevant the timing ticks supplied by the 
external shady input etick. 

2. Nondeterministic choice of the current flow (or flow after jump). This choice 
has to be done in accordance with the current available menu; it is fully on 
the responsibility of the controller. 
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Synthesis of controllers in [ABD*] (it may be consulted for other references). 
Here are the main features: 

(i) Beyond safety, is handled also the important liveness property called vi- 
ability (nonblocking). 

(ii) There is no demand that control should be reached via feedback with an 
explicitly produced companion-controller. In addition to these basic distinctions 
we observe also: 

(iii) There is no reference in [ABD*] to completeness of automata and to the 
possible impact of this property on the subject. 

And now, more details. 

Viability requires that each finite trajectory of the safely constrained au- 
tomaton should be extendible at infinitum to a safe trajectory. Since, in general, 
this is not the case, the following task arises: reinforce the original safety con- 
straints into new ones such that the automaton fits viability wrt them. The 
authors develop a procedure (Algorithm 1) intended to produce the less restric- 
tive reinforcement of this kind. The constrained automaton produced by the 
procedure above is called by the authors “controller”, whereas the procedure 
itself is identified with the synthesis of the controller. 

Further (quotation from [ABD*] follows): “... The termination of the pro- 
cedure, however, cannot be guaranteed... Moreover, the implementation is very 
complicated... Some aspects of the technique take advantage of special proper- 
ties of linear systems. Given this state-of-affairs we must resort to the classical 
solution of continuous mathematicians, numerical approximation.” 

But, beyond these troubles, one more fact is worth attention. One can show 
that the procedure may produce an incomplete automaton even if the original 
one is complete. This may happen to be relevant for the next point. 

Referring to the determinization of their controller, [ABD*] claim (without 
proof): “This can be fixed later ... and the feed-back map will become a func- 
tion...” Indeed, this is evident wrt the nondeterministic choice of the current 
flow. Moreover, under the assumption that the resulting “controller” is complete, 
the techniques of boundary crossings would allow also to fix the nondeterminism 
of time-detection. However, as we just mentioned, this argument cannot be used. 
Actually, we doubt if this can be cured in some other way. 

About Hybrids. Nice names may happen to precede formal concepts and, 
eventually, to engender different concepts. Hence, the controversial question: 
which of them captures better the intended reality we want to understand and 
deserves to preempt this name. By now there are various versions of “Hybrid 
System” and the related Control problems. 

The conceptual/notational approach in [H] (which may be consulted for fur- 
ther references), focuses on “Hybrid Automata”. It differs from that advocated 
in this paper as follows: 

(i) No consideration of operators/transducers, feedback reliability. 

(ii) Use of instantaneous transitions. Remember that, according to Ax, the only 
instantaneous transition is associated with e. 
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(iii) Multiform time. This amounts to breaking the time-axis into a se- 
quence of closed intervals, which may reduce to single points. Accordingly, 
legitimacy is given to “signals” (called in [MP3] “functions which are not re- 
ally functions”) that may have different values at the “same” time-instant. 
Clearly, this is inconsistent with Ax , whose signals are genuine functions. 

(iv) Inclusion of asynchrony in the basic model of Hybrid Automata. 

(v) No explicit representation of the hybrid as a pair of interacting components. 

The last shortcoming is criticized and remedied in [OD] for timed automata 
(a particular case of hybrid automata). 

Quotation, “...real-time system can be decomposed into an untimed system 
communicating with suitable timers. Both synchronous and asynchronous com- 
munications are considered... At first sight it seems that the decomposition... is 
already solved by the model of timed automata... However, the main difference... 
is that in the timed automata model the clock operations are indivisible cou- 
pled with the transitions whereas here we present a clear separation of untimed 
system and timers with explicit communication between them”. 

In [DN] Hybrid Control Systems are explicitly structured as interacting 
plants, controllers and interfaces. However, feedback-reliability issues are not 
considered. In [DN] there is no explicit reference to completeness; on the other 
side, there is a clear, inspiring presentation of the border-crossing mechanism, 
which suggests the importance of this property. 

Circuits in [MPlj. In terms comparable with section 4, it seems that the in- 
tention is to circuits (synchronous !) of nondeterministic retrospective operators 
(the authors use the name “Asynchronous Circuits”). In this case, the subject 
could be handled after the scenario from Observation 12; the result would be 
that the circuits under consideration are indeed feedback reliable. However, be- 
ing absorbed by specific applications, the authors ignore this crucial aspect of 
circuitry. What they call “solution” of the relevant system Eq = {Eqi} is, actu- 
ally, the relation R = /\Eqi (see section 1) irrespectively of the question if Eq 
is sound. This deviation from the “circuit” -philosophy is visible already for dis- 
crete time. The full scenario would require also to present an 3-implementation 
for the nondeterministic continuous delay considered in [MPl] . 

Control Theory. Here, Continuous Automata (Dynamical Systems) are spec- 
ified by differential equations. But note that models with discrete time (albeit 
with continuous data) are also considered ([S2]) 

In ([A,S2]) hybrids are treated as circuits of appropriate transducers: plants, 
interfaces, controllers. In [A] is used a very non-trivial interface; the invention of 
the interface and the check that the circuit is feedback reliable are beyond the 
concepts and techniques of classical automata-theory. 

In [A,S2] the controller is implemented as a timed automaton. According to 
our definitions, this might be interpreted as follows: in addition to the main Plant 
and Interface, one uses some auxiliary primitive plants like Timers etc. From this 
perspective, the controller is again a finite (and, hence, a duration-independent) 
automaton. 
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1 Introduction 

Verification of complex systems cannot be achieved without combining several 
analysis methods and techniques. A widely adopted approach consists in com- 
bining abstraction methods with algorithmic verification techniques. Typically, 
finite abstractions are built using automatic or semi-automatic methods and 
model-checking algorithms are applied on these abstractions in order to verify 
the desired properties. However, finding faithful finite abstractions is often hard 
since many aspects in the behavior of the system must be hidden or encoded in 
a nontrivial and ad-hoc manner. This is particularly true for software systems 
since their behavior depends in a very crucial manner on the manipulation of 
data structures and variables which are assumed to range over infinite domains 
(e.g., unbounded stacks, queues, arrays, counters), or over finite domains whose 
sizes are left as parameters. Moreover, many systems are defined as networks 
of parametric size, i.e., they are assumed to work for an arbitrary number of 
processes running in parallel. 

Hence, there is a real need (1) to define models allowing to capture essential 
aspects which are beyond the expressive power of finite models (e.g., manipu- 
lation of unbounded variables, parametrization), and (2) to develop algorithmic 
verification techniques which can be applied to these infinite-state models. 

In this paper, we consider models based on rewriting systems and we develop 
an algorithmic approach for analyzing automatically such models. In the frame- 
work we adopt, configurations of systems are seen as words or vectors of words, 
and actions are represented by means of rewriting rules. Different rewriting poli- 
cies can be considered, e.g., prefix, cyclic, or factor rewriting. They allow to 
model different classes of systems, e.g., pushdown systems, communicating sys- 
tems through FIFO channels, or parametrized networks of identical finite-state 
processes connected according to a linear topology. 

Then, the main problem we address is the problem of computing a represen- 
tation of the infinite set of all reachable configurations in a model. Solving this 
problem is indeed the kernel of most of the verification methods. In our setting, 
this problem relies on computing the closure of a language by a rewriting system, 
i.e., given a rewriting system R and a language if>, compute where R* is 

the reflexive-transitive closure of the relation induced by R. We present several 
closure results concerning different classes of languages and rewriting systems, 
and we show the applications of these results in symbolic reachability analysis 
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of different infinite-state systems. The results we present in this paper are not 
new. Our aim here is to present a general approach for algorithmic verification 
of infinite-state systems, and to show in a simple and uniform manner several 
results we have established in the last few years. 

The paper is organized as follows: In Section O we present the principle of 
a general algorithmic verification approach based on symbolic reachability anal- 
ysis. In Section 0 we define classes of rewriting systems and show their use as 
models for various kinds of infinite-state systems. In Sectional we presents re- 
sults on the computability of the closure of languages by rewriting systems, and 
show their relevance in verification. Finally, in Section 0 we give a presentation 
of related work. 

2 Symbolic Reachability Analysis 

A system can be modeled as a pair (C,p) where C is the set of all possible 
configurations of the system, and p Q CxC is a, binary transition relation between 
configurations. 

Given a relation p, let us denote by p® the relation obtained by i compositions 
of p, i.e., p° is the identity relation, and for i > 0, p*“*'^ = p* o p. Then, let p* be 
the reflexive-transitive closure of the relation p, i.e., p* = IJj>QP*. 

Given a relation p and a configuration 7 , let p( 7 ) = { 7 ' G C : ( 7 , 7 ') £ p}. 
Intuitively, p( 7 ) is the set of all immediate successors of the configuration 7 , and 
p*( 7 ) is the set of all reachable configurations from 7 . These definitions can be 
generalized straightforwardly to sets of configurations. 

Verification problems, especially for safety properties, can be often reduced 
to the reachability analysis problem, i.e., to computing the set of all reachable 
configurations starting from a given (possibly infinite) set of initial configurations 
(() C C. In our setting, this consists in computing the set p*{4>). More precisely, 
the problem is to construct a finite representation of the set p*{4>), given a finite 
representation of the set (j). Then, the central problem we address can be stated 
as follows: 

n : identify classes of recursive binary relations TZ between configurations as 
well as classes of finite representation structures 5i and S 2 corresponding 
to two classes of sets of configurations, such that for every effectively Si~ 
representable set 4> and every relation p € TZ, the set p*{4>) is effectively 
S 2 ~representable. 

In order to be relevant to system verification, this problem must be ad- 
dressed for classes of representation structures enjoying some minimal closure 
and decidability properties (e.g., the decidability of the inclusion test). Often, 
it is interesting to consider a stronger version of the problem above, where we 
require that 5i and S 2 are the same class. Of course, few classes of models for 
practical infinite-state systems have a decidable reachability problem. Hence, 
it is clear that the verification problem of infinite-state systems cannot be re- 
duced in general to finding a solution to the problem {II). However, solutions 
to the problem (7T) can be embedded in a more general (or more pragmatic) 
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approach in order to tackle classes of infinite-state systems with an undecid- 
able reachability problem. The idea is the following one: if we cannot provide 
an algorithm for computing directly the set p*{4’), we adopt a semi-algorithmic 
approach (i.e., termination is not guaranteed) based on an iterative exploration 
of the reachable configurations. In order to speed up this procedure and help its 
termination, we use within this exploration solutions for the problem (7T) con- 
cerning subrelations of p. That is, at each iteration we compute the p-image of 
the reachable configurations found so far, as well as, when possible, their images 
by transitive closures of some (compositions of) subrelations of p. Hence, solu- 
tions for the problem (77), even if they concern restricted classes of relations, can 
be relevant for enhancing the iterative reachability analysis procedure, provided 
that the computed intermediate sets belong to the same class of representation 
structures. 

Let us see this in more details. From the definition of the set p*(</>), a pro- 
cedure for computing it would be to construct iteratively the non-decreasing 
sequence of sets (j)i = Uo<j<i ^ until (j)k+i C 4>k for some fc > 0. 
In such a case, we have necessarily ij)k = p*{4>)- Actually, the sequence {(f)i)i>o 
can be computed by taking 



4>o = 4> 

4>i+i = p{4>i) 

Of course, in order to be able to apply this procedure, we need a class of 
representation structures S such that: (1) (p is 5-representable, (2) S is effectively 
closed under union and computing the p-image, and (3) the inclusion test is 
decidable in S. 

However, it is obvious that this naive procedure does not terminate in general 
(in all nontrivial cases where C is infinite) . Therefore, we enhance this procedure 
using a fixpoint acceleration technique, according to the terminology used in the 
abstract interpretation community ITTT771 . 

Let us first introduce some notation. Given a finite set of relations TZ, we 
denote by Comp{TZ) the smallest set of relations which contains TZ and which is 
closed under the operations of union (U) and composition (o) of relations. 

Now, let p be a relation and phe a, set of initial configurations representable 
in a class of representation structure S. A new reachability analysis procedure 
for computing p*{<p) can be defined by considering a decomposition 



p = p' U Pi U . . . U Pm 



and by defining a finite set of relations Oi, . . . , G Comp{{pi, . . . , Pm}) such 
that it is possible to compute and to represent in S the set d*{ip), for each 
i G {1, • • . ,n} and for every ip in the class S. Typically, the decomposition of 
p we consider is extracted from its definition as a finite union of relations, each 
of them corresponding to a possible action (or set of actions) in the modeled 
system, and very often, the OPs can be chosen to be the pi’s themselves. 
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Then, the new iterative procedure we apply consists in computing the se- 
quence of non-decreasing sets {tpi)i>o defined as follows: 

•00 = <^ 

0i+i = 0i U p'(0i) U U . . . U 0*(0i) 

until ipk+i C for some k > 0. Clearly, we have for every i > 0, (f>i Q ipi and 

C Uj>o/^*(^) = P*{4’)- This means that, if this procedure stops, it computes 
precisely an 5-representation of the set p* (</>). 

The procedure described above generates the set of reachable configurations 
according to a breadth first search strategy, using additional transitions called 
meta-transitions (as in inM), each of them corresponding to iterating an 
arbitrary number of times the application of a transition relation 6i, for i G 
,n}. Actually, different search strategies may be adopted for generating 
the set of reachable configurations, e.g., a depth first search strategy with priority 
to meta-transitions. 

Of course, the method proposed above does not guarantee termination. The 
reachability analysis procedure terminates only if we can define a suitable finite 
set of meta-transitions. This is obviously related to our ability to find solutions 
to the problem (77) stated at the beginning of this section. 



3 Models Based on Rewriting Systems 

We consider here models which correspond to systems operating on sequential 
data structures (such as stacks or queues). In these models, configurations are 
vectors of words, and transition relations between configurations are defined by 
means of sets of rewriting rules. 



3.1 Rewriting Systems 

Let A be a finite alphabet (set of symbols). For n > 1, an n-dim rewriting rule r 
over A is a pair {x, y) where x,y G (A*)”. We denote such a rule by r : x ^ y. 
The left hand side (resp. right hand side) of r, denoted by lhs{r) (resp. rhs{r)), 
is the vector x (resp. y). 

A n-dim rewriting system is a finite set of n-dim rewriting rules. We consider 
hereafter three notions of rewriting relations between vectors of words. Given 
an n-dim rewriting system R, the prefix (resp. cyclic, factor) rewriting relation 
associated with R is the relation Rp (resp. Rc,Rf) C (A*)*^ x (A*)" such that 
for every M= (ui,... ,Un),v= (vi,... ,Vn) G (A*)”, (u,v) G Rp (resp. Rc, Rf) 
if and only if there exists a rule r : {x\, . . . , x„) i— >■ (j/i, . . . , y„) G R such that 
for every i € {1, . . . , n}, we have respectively, 

{Prefix rewriting) 3wi G A*. Ui = XiWi and Vi = yiWi, 

{Cyclic rewriting) 3wi G S*. Ui = XiWi and Vi = Wiyi, 

{Factor rewriting) 3wi,w{ G S* . Ui = WiXiw{ and Vi = Wiyiwf 



28 



A. Bouajjani 



3.2 Models of Infinite-State Systems 

The models we consider are defined as pairs (C, p) where the set of configurations 
is C = (A'*)”, and the transition relation p is one of the rewriting relations R^, 
with f G {p, c, /}, for some given rewriting system R. 



Automata with unbounded sequential data structures: The very com- 
mon models of pushdown systems and FIFO-channel systems can be straight- 
forwardly represented in our framework. 

Indeed, prefix rewriting models the actions of a system manipulating push- 
down stacks, and cyclic rewriting corresponds to operations on FIFO queues (or 
communication channels) . One additional dimension in the rewriting system can 
be used to encode the control states. 

For instance, consider a system which manipulates one pushdown stack (resp. 
one FIFO queue). A rule r : {a,x) i— >■ {b,y) where a,b € S and x,y € S*, 
represents the action of (1) testing whether the sequence of symbols x can be 
removed from the stack (resp. the queue), and if yes, (2) moving from the control 
state a to the control state b, and putting the sequence y into the stack (resp. 
the queue) after having removed x from it. 

In the sequel, we call an n-dim controlled rewriting system any set of {n+ 1)- 
dim rules of the form (a, x) i— >■ (6, y) where a,b G S and x,y G (if*)". 



Parametrized networks: We use factor rewriting for modelling parametrized 
systems with an arbitrary number of identical finite-state components (pro- 
cesses), connected according to a linear topology. 

Let S be the finite set of states of each of these components. Then, in order 
to reason uniformly about the family of systems with arbitrary number of com- 
ponents, we consider that a configuration is a finite word over S, the i*^ element 
of the word corresponding to the state of the i*^ component, and various classes 
of languages (e.g., regular languages) can be used to represent sets of config- 
urations of arbitrary lengths. Therefore, actions of such parametrized systems 
can be represented naturally as rewriting rules, each of them corresponding to 
simultaneous moves in some components of the system. The kind of rules we 
consider here allow to represent moves involving a finite number of processes 
located within a bounded distance from each other. Typically, communication 
(rendez-vous) between immediate neighbors can be modeled by rewriting rules 
of the form ab i— >■ cd, meaning that if two processes i and i + 1 are in states a 
and b respectively, then they can move simultaneously to their respective states 
c and d. 

Take as an example a simple version of the token-passing mutual exclusion 
protocol: We assume that processes are arranged linearly. A process who owns 
the right to enter the critical section (the token) can transmit it to its right 
neighbor. Each process has two possible states, either 1 if it owns the token, or 
0 otherwise. We suppose that initial configurations are all those in which the 
leftmost process has the token. Since the number of processes is not fixed, the 
set of initial configurations is precisely the language 10*. Then, the transition 
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relation between configurations, which models the action of passing the token 
from left to right, corresponds to the relation Rf, where i? = {10 i— ?> 01}. It is 
easy to see that the set of reachable configurations is i?}(10*) = 0*10*. 

4 Results 

We present in this section solutions of the problem (77) when the class of repre- 
sentation structures correspond to (subclasses of) recognizable sets. Let us recall 
that an n-dim recognizable set is a finite union of sets of the form L\ x . . . x Ln 
where each Li is a regular set (i.e., FSM definable), for i G (1, . . . , n}. 

Clearly, the class of recognizable sets enjoys the closure and decision proper- 
ties required from symbolic representation structures. Indeed, this class is closed 
under all boolean operations, it is also closed under the application of regular 
relations (notice that the relation 7?p with f G {p,c, /}, is obviously regular for 
any rewriting systems R), and its inclusion problem is decidable. 



4.1 Prefix Rewriting 



The following theorem has been proved several times by authors from different 
communities with different motivations (see e.g., [( ;a,ii92fHt';iVI97fKWW97^ ). 



Theorem 1. Let R he a 1-dim controlled rewriting system. Then, for every 
effectively recognizable set 4>, the set Rp{4>) is effectively recognizable. 

In jH K M 97IF W W9'l^ . versions of this theorem are used to define verification 
algorithms for pushdown systems against different specification logics (temporal 
logics and /i-calculi). Reachability analysis and model checking techniques for 
pushdown systems have applications in the domain of program analysis \EKm 
lESOlj . 

Unfortunately, there is no algorithm which constructs the set R*{4>) for any 2- 
dim rewriting system R and recognizable set (f). This can be shown by a straight- 
forward reduction of the Post correspondence problem. 



4.2 Cyclic Rewriting 

It is well known that a finite automaton equipped with a FIFO queue is as 
powerful as a Turing machine. So, the 7?* image is obviously not computable for 
any 1-dim controlled rewriting system. Moreover, such a model can be simulated 
very easily by a 1-dim cyclic rewriting system: a rule of the form (g, x) i— >■ (g', y) 
can be simulated by the application of a rule qx i— >■ yq' followed by rotation rules 
of the form a i— >■ a, for all symbols a which are not control states. 

Hence, in order to solve the problem (77) for cyclic rewriting, it is necessary 
to restrict the considered class of rewriting systems. A typical restriction is to 
consider controlled rewriting systems corresponding to control loops. A control 
loop is a set of rules 

ri : ^ (gi,?/i) 



^ m ■ (^m 5 ) (9mJ 2/m) 
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such that, (1) Vt,j G {I,-- - ,m} with i ^ j, qi ^ qj and q[ ^ q'j, (2) Vi G 
{1,... ,m- 1}, g- = q,+i, and (3) q'^ = qi. 

Boigelot et al. have shown the following result: 

Theorem 2 ( |BGWW97| ). Let R be a 1-dim eontrol loop. Then, for every 
effeetively reeognizahle set 4>, the set R*{4>) is effeetively reeognizable. 

For systems of higher dimensions (even for 2-dim systems), the R* image 
is not recognizable, in general. Indeed, consider for instance the self-loop R = 
{(g,e, e) >->■ (g, a,a)}. Then, R*{q,e,e) = {(g, a”,a”) : n > 0} which is a 

non-recognizable set. 

[BGWW^ provides a characterization of the control loops R such that R* 
preserves recognizability, as well as an algorithm which constructs for such loops 
a finite automaton representing the R* image of any given recognizable set. 

In [BHDTj we show that the effect of iterating any control loop can be char- 
acterized using representation structures defining a class of non-recognizable 
sets enjoying all needed closure and decision properties for symbolic reachability 
analysis. These structures, called CQDD’s, correspond to a class of constrained 
(products of) deterministic automata. The constraints we consider are expressed 
in Presburger arithmetics and concern the number of times transitions of the au- 
tomata are taken in the accepting runs. For instance, the set i?*(g, e, e) above 
can be defined as a product of two automata Ai and A 2 each of them recognizing 
the language a*, under the constraint imposing that the number of a-transitions 
taken in each of the two automata are the same (see |BH97| for more details on 
CQDD’s). We have the following result: 

Theorem 3 ( [BH97j I. Let R be a n-dim control loop. Then, for every ef- 
fectively CQDD representable set (j>, the set Rf{4>) is effectively CQDD repre- 
sentable. 

A similar theorem can be shown for prefix rewriting, i.e., the class of CQDD’s 
is closed under R* for any n-dim control loop R. 

As mentioned in Section 01 cyclic (controlled) rewriting systems are suitable 
for modeling communicating systems through FIFO channels, e.g., communi- 
cation protocols. In many cases, the purpose of these protocols is to ensure a 
perfect data transfer through unreliable channels. Hence, it is natural in this 
context to consider models where channels are lossy in the sense that they can 
lose a message at any time. In our setting, the lossiness assumption can be taken 
into account by considering a weak cyclic rewriting relation, where configurations 
can get smaller according to the subword relation (meaning that some symbols 
are lost), before and after any cyclic rewriting step. 

Let ^ C 17* X be the subword relation, i.e., oi . . . a„ ^ b\ . . .bm if there 
exists ii, . . . ,in € {I, . . . , m} such that zi < . . . < and Vj G {1, . . . , n}. Oj = 
bi- . We consider the product generalization of this relation to vectors of words. 

Let i? be a n-dim rewriting system over 27. We define the weak cyclic rewriting 
relation G (27*)" x (27*)") as follows: for every u,v G (27*)", {u, v) G R^c if 
and only if there exist u' ,v' G (27*)" such that u' <u,v'g Rc{u'), and v <v' . 

An n-dim language L is downward closed w.r.t. the subword relation if Vm, v G 
(27*)", \i V G L and u < v, then u G L. Let Lx denote the downward closure 
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of L, i.e., the smallest downward closed set which includes L. Clearly, for every 
rewriting system R and every set (j), the set is downward closed. Hence, 

by showing that every downward closed set w.r.t ^ is a recognizable set, the 
following fact can be deduced. 

Theorem 4 ( lAC.TT96tCF196'n . For every rewriting system R, and every set 
(f), the set R^^{(j)) is a recognizable set. 

Theorem EJdoes not say that the set R’^^{4>) is constructible, even though it 
is recognizable. Actually, we know from the results in [May00| that: 

Theorem 5. There is no algorithm which constructs the set for any 

given 1-dim controlled rewriting system R and recognizable set (j). 

We can refine Theorem 0 by defining representation structures which cap- 
ture precisely the class of downward closed sets. These representation structures 
correspond to a particular subclass of regular expressions called simple regular 
expressions (SRE for short). Their definition is as follows: Let us call atomic 
expression any expression of the form a where a G A, of the form A* where 
A Q S. A product is either the empty word e, of a finite sequence ci • • • Cm of 
atomic expressions. Then, an SRE is either 0, or a finite union pi Pn of 

products. A n-dim SRE set is a finite union of Cartesian products of n SRE sets. 
It is very easy to see that every SRE set is downward closed w.r.t. the subword 
relation. Conversely, by showing that for every recognizable set L, the set is 
effectively SRE representable, we obtain the following fact: 

Theorem 6 ( |ABJ98 |). SRE sets are precisely the downward closed sets w.r.t. 
the subword relation. 

The class SRE has interesting properties which makes it suitable for efficient 
reachability analysis of lossy FIFO-channel systems. 

Theorem 7 ( [AB.T98J L The class of SRE sets is closed under union, inter- 
section, and application of regular relations (e.g., rewriting relations). Moreover, 
the inclusion problem for SREs can be solved in polynomial time. 

Notice that the class of SREs is not closed under complementation. In- 
deed, complements of SRE languages are upward closed languages w.r.t. the 
subword relation. They correspond to finite unions of languages of the form 
S*aiS*a2 ■ ■ ■ anS*. 

Theorem 8 ( [AB.198J ). Let R be a n-dim control loop. Then, for every SRE 
set 4>, it is possible to construct an SRE representation of the set which 

has a polynomial size w.r.t. the size of (). 

Based on the two theorems above, we derive a symbolic reachability analysis 
procedure as described in Section Q This procedure has been implemented and 
used to analyze in a fully automatic manner several infinite-state models of 
communication protocols such as the alternating bit protocol, the sliding window 
protocol, and the bounded retransmission protocol |A AHQflIAHSflIj . 
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Now, it is very easy to construct a model for which computing the effect 
of control loops does not help the reachability analysis procedure to terminate. 
Consider for instance the system R with two rules r\ \ a ^ bb and C 2 : 6 i— t aa 
corresponding to two self-loops (loops on a single control state q which is omitted 
here). It can be seen that = a*b* + b*a* (notice that, due to lossiness, 

there cannot be any constraints on the numbers of a and b in the reachable 
configurations). However, it is impossible to compute this set by reasoning about 
a finite number of iterated compositions of the two rules of R (control loops 
corresponding to compositions of the two considered self-loops). To see this, let 
us consider the relation corresponding to any of such a loop. This relation can 
be written as 

0 = o o . . . o o 

where the mi’s and rii’s are positive integers. It is can be checked that, for every 
word w G S*, 9*{w) is always a finite language 

For instance, let w = babab. Then, we have 

{^iK«c('^) = {babb'^,bb'^,bb‘^}^ = {babb"^ , bb^} ^ 

{r2}wci'^) = {ababa^ , aba^ , , aba^ , a'^ , a^} ^ = {ababa^ ,aba^,a^}^ 

Notice that the number of possible iterations of the relations {rij^jc and {r 2 }u,c 
is always bounded. It depends on the number of occurrences of a’s (resp. b’s) in 
the initial word w. 

As another example, take 9 = {r 2 }u,c ° {fi}wc and w = a. Then, we have 
9{w) = {r 2 }wc{{b‘^}^) = {ba^}^ 

9^(w) = 9({ba^,a^,ba,b,a,ej) = {r 2 }^uc({ab^}^) = {ba^}^ 

Thus, we have 9*(a) = {ba^}^. 

Since R^^{a) is an infinite set, and the iteration of each relation 9 of the form 
specified above can only produce a finite set of words, it can be concluded that 
the reachability analysis procedure using only meta-transitions like 9 does not 
terminate in this case. 

An interesting question is under which conditions it is possible to compute 
the effect of iterating nested control loops. Unfortunately, we have the following 
negative result: 

Theorem 9 ( [ABBOlj b There is no algorithm which constructs the set R*^^{4)) 
for any given 1-dim rewriting system R and any set (f>. 

This means, that it is even impossible to compute the effect of sets of self- 
loops of the form {q,x) 1 — > ( 9 , 2 /) where x and y are two words over E. To prove 
this result, we need rules where the left hand side x is of size 2. However, the 
situation is different when this size is assumed to be at most one. 

We consider that an n-dim rewriting rule r is context-free if lhs{r) G (U U 
{e})”. A n-dim context-free rewriting system is a set of n-dim context-free rules. 
For instance, the system R = {a ^ bh,b ^ 00 } considered above is a context-free 
system. We have the following result: 
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Theorem 10 ( [ABBOT] h Let R he a 1-dim eontext-free rewriting system. 
Then, for every effectively SRE set (j), the set is effectively SRE. 

Using Theorem El it is very easy to show that the result above cannot be 
extended to 2-dim context-free rewriting systems. Therefore, the question is 
under which conditions it is possible to construct the effect of n-dim context- 
free systems. We propose hereafter one such condition. 

A rewriting system i? is a ring if, for every rule r : {x\ , . . . ,Xn) ^ {yi, - ■ ■ ,yn) 
in ii, G {1, . . . , n} such that Vj yf i. Xj = e and Vj yf (i -I- 1) mod n. yj = e. 

Thus, each rule r in a ring is either of the form (e, . . . , e, Xi, e, . . . , e) i— >■ 
(e, . . . , e, y^+l,e , . . . , e) or of the form (e, . . . , e, x„) (j/i, e, . . . , e). Intuitively, 

the each rule in these systems correspond to actions of FIFO-channel systems 
where a word x is received from a channel of index i, and a word y is sent to the 
channel of index (z -|- 1) mod n. 

Theorem 11 f [ABB0T| [). Let R be a n-dim context-free ring. Then, for every 
effectively SRE set <p, the set R^„.{(p) is effectively SRE. 

4.3 Factor Rewriting 

As mentioned in Section 0 factor rewriting rules can be used to represent transi- 
tions in parametrized systems (networks) with an arbitrary number of identical 
finite-state components. An interesting class of rewriting rules which appear 
in this context are the so-called semi- commutations: A 1-dim rewriting rule 
is a semi-commutation if it is of the form ab i— ^ ba where a,b G E. A semi- 
commutation rewriting system is a set of semi-commutation rules. 

Semi-commutations are naturally used to model transitions corresponding to 
information exchange between neighbors, e.g., token passing protocols for mutual 
exclusion (see Section El, leader election algorithms, etc. We present later an 
example where semi-commutation appear in the model of a lift controller for an 
arbitrary number of floors. In that example, semi-commutation rules correspond 
to the actions of moving up or down from one floor to its immediate successor. 

It is well known that the class of recognizable sets is in general not closed 
under R*j where R is any semi-commutation system. For instance, consider the 
system R = {ab i— >■ ba}. Then, it is easy to see that for <j) = (ab)*, the set R}{4>) 
is not recognizable. 

Therefore, the question is to find a class of representation structures defining 
a subclass of recognizable sets which is closed under iterative semi-commutation 
rewriting. As an answer to this question, we propose a subclass of regular ex- 
pressions called APC (alphabetic pattern constraints). We define APCs exactly 
as the SREs introduced above, except that we also allow in APCs atomic ex- 
pressions of the form a, where a G E (APC are not downward closed w.r.t. ^ 
in general). In other words, APC languages are finite unions of languages of 
the form E^aiE^ ■ ■ ■ anE*_^_.^ where the afs are symbols in E and the Efs are 
subsets of E. (The class APC coincides with the class of languages on level 3/2 
of Straubing’s concatenation hierarchy EM|.) 

The motivation behind the consideration of this particular kind of languages 
is that they appear naturally in many specification and verification contexts. 
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First, APC languages can be used to express properties based on specifying 
some patterns appearing within configurations. Typically, negations of (some) 
safety properties are expressed by an APC defining the set of all bad patterns. 
For example, in the case of the token passing protocol mentioned in Section 01 
the set of bad configurations, i.e., all those which do not satisfy the mutual ex- 
clusion property, is defined by ( 0 - 1 - 1 )* 1 ( 0 -|- 1 )* 1 ( 0 -|- 1 )*. Thus, since this set has 
empty intersection with the set of reachable configurations 0*10*, it can be con- 
cluded that the mutual exclusion property is satisfied. Furthermore, it turns out 
that the reachability sets of many infinite-state systems and parametrized sys- 
tems, including communication protocols like the alternation-bit and the sliding 
window, and parametrized mutual exclusion protocols such as the token ring, 
Szymanski’s, Burns’, or Dijkstra’s protocols, are all expressible as APCs (see 
[IAfylOslAAPOOIAfi.lNOOI.IN()(ifi.lNT()()tTou()()j V 

It can be shown that the class APC has the following properties. 

Theorem 12 ( [BMTOlJ h The class of APCs is closed under union, intersec- 
tion, and rewriting (application of single rewriting rules), but it is not closed 
under complementation. The inclusion problem for APCs is PSPACE-complete. 

The main closure result about APCs is the following: 

Theorem 13 ( [TouOO BMT0l] b Let R be a semi- commutation rewriting sys- 
tem. Then, for every APC set (p, the set R*f{ip) is effectively APC. 

Actually, this result can be slightly extended to system including symbol 
substitutions. We call a symbol substitution rewriting system any set of rules of 
the form a H> 6. First, it is easy to see that APCs are effectively closed under 

for any symbol substitution rewriting system. The proof of Theorem El can 
be easily adapted to rewriting systems which are sets of semi-commutations and 
symbol substitutions moj. 

Let us illustrate the use of these results on a simple example. We consider a 
lift controller which has the following behavior: People can arrive at any time to 
any floor and declare their will to move up or down. The lift is initially at the 
lower floor, and then it keeps moving from the lower floor to the upper one, and 
back. In its ascending (resp. descending) phase, it takes all the people who are 
waiting for moving up (resp. down) and ignores the others. They are taken into 
account in the next phase. 

For every n (number of floors), a configuration of this system can be repre- 
sented by a word of the form 

ffx\ * * * X jyx * * * Xjiff 

where y S {of, a),}, and Xi G for i G {!,..■ ,n}. The symbol 

corresponding to Xi represents the state of the floor: Xi = bff if there are 
people waiting for moving up and other people (at the same floor) waiting for 
moving down, Xi = (resp. Xi = bf) means that there are people waiting at 
this floor and all of them want to move up (resp. down), and Xi = 1- means that 
nobody is waiting at this floor. The symbol corresponding to y gives the position 
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of the lift: in the configuration given above, if y = af (resp. y = aj,) then, the 
lift is at floor j + 1 (resp. j), and it is moving up (resp. down). 

The set of all initial configurations, for an arbitrary number of floors, is the 
set of words (j)o = #af -L*#, which means that initially, the lift is at the lower 
floor and there is no requests at any floor. The dynamic of the system can be 
modeled by the following rewriting rules: 



T 


(1) 


T 1— 


(2) 




(3) 




(4) 


at T !->• Tat 


(5) 


at bl 1— at 


(6) 


at &t -Lot 


(7) 


at 6ti ^ bl at 


(8) 


at # ^ 4 # 


(9) 


T4 


(10) 


4 4 4 4 


(11) 


6t4'^4-L 


(12) 


6ti 4 4 4 


(13) 


#4 ^ #4 


(14) 



Rules 1, 2, 3, and 4 are symbol substitutions modeling the arrival of users. 
Let us call request their corresponding action. Rules 5 and 6 (resp. 10 and 11) 
are semi-commutations modeling the moves of the lift upward (resp. downward). 
They correspond to the action move-up (resp. move-down). Rules 7 and 8 (resp. 
12 and 13) represent the action of taking at some floor the people who want 
to move up (resp. down). We call the corresponding actions take-up (resp. take- 
down). Finally, rules 9 and 14 represent the actions of switching from the ascend- 
ing to the descending phase (action up2down), and vice-versa (action down2up). 

TableQshows the computations of the reachable configurations of the lift con- 
troller according to a depth first search strategy with priority to meta-transitions 
(we omit some unimportant steps). The used meta-transitions are request* corre- 
sponding to the relation {1U2U3U4}^, move-up* corresponding to {5U6})-, and 
move-down* corresponding to {10 U H}y. The image by request* is easy to com- 
pute (APCs are effectively closed under iterated symbol substitution rewriting), 
and the images by move-up* and move-down* are computable by the algorithm 
underlying Theorem 1 1 31 

As shown in Table ^ the reachability analysis terminates in this case thanks 
to the use of meta-transitions. It is worth noting that the reachability analysis 
procedure also gives (for free) a finite abstraction of the analyzed infinite-state 
model. Indeed, Table[Ddefines an abstract reachability graph of the lift controller 
which is shown in Figure ^ 
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Table 1. Reachability Analysis of the Lift Controller 
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Fig. 1. Abstract Reachability Graph of the Lift Controller 



5 Related Work 



Several papers propose symbolic reachability analysis techniques for infinite- 
state systems based on using representations of languages to define sets of con- 
figurations. In these works, sets of configurations are represented by means of 
various kinds of automata, regular expressions, and formulas of monadic first or 
second order logics (see e.g., [IH( A)ti|HRT ^11)7IHH()7IH(IWWT77IKMM+97IWT?T;?^ 

Papers such as pCMM~*~97IWljll8IIj,IJNT()()IPS()()] introduce a uniform verifi- 
cation paradigm for infinite-state systems, called regular model- checking^ based 
on the use of regular languages (finite automata or WSIS formulas) as symbolic 
representations, and of regular relations (finite transducers or formulas) as mod- 
els of transition relations of systems. The concepts we present is this paper are 
very close to those developed for regular model-checking. However, we can make 
the following comparison between the two frameworks. 
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First, we do not require here that the manipulated languages are regular. 
For instance, the results of show that representation structures defining 

non-regular languages can be used and they are needed for some applications. 

Moreover, the verification approach adopted in, e.g., IR.INT00IFM001 consists 
in constructing (when possible) transitive closures of regular relations, (i.e., given 
a regular relation p, construct a representation of p* , as a finite transducer for 
instance). This problem is more general and of course harder than the problem 
(7T) we have considered in this paper (see Section 0, which is to construct the 
image of a given set 4>hy p* . Indeed, there are many cases where p*{4>) is com- 
putable for every (j) in some class of languages, whereas p* is not constructible, 
or at least, not regular (e.g., for relation induced by semi-commutation rewriting 
systems piMTflIj L Nevertheless, in the context of regular model checking, inter- 
esting classes of relations for which the transitive closure is computable have been 
identified in e.g., |AB,TN99I,TN00| . Other works propose incomplete procedures 
for computing transitive closures of relations |B,INT0niPS()0il )l ySPlj . 

Also, for the sake of simplicity, we have considered in this paper only special 
kinds of rewriting systems (for instance, these rewriting systems cannot define 
all the relations considered in [lA H.l IN99f.l NOOpP.I NTOOj ) . Of course, more general 
forms of rewriting systems can be used within the framework we present. 

The symbolic reachability analysis approach we describe in this paper uses 
the concept of meta-transition introduced in fBW94| in order to help termina- 
tion. This technique can be seen as a fixpoint acceleration in the context of ab- 
stract interpretation fTTTTT] . However, these works use widening operators which 
lead in general to the computation of an upper-approximation of the reachability 
set, whereas the results we present in this paper allow to perform exact com- 
putations. It is worth noting that widening operations are defined depending 
only on the intermediary sets which are generated during the computation of 
the reachability set, regardless of the applied actions. In contrast, the approach 
we adopt here for acceleration takes into account the applied actions (rewrit- 
ing rules) in order to compute the exact effect of their iteration. In [B.TNTflfll 
rTCT?i , widening techniques on automata and transducers are defined for regular 
model-checking. 

The use of rewriting systems as models for infinite-state systems has been 
considered for instance in ICau92IMol96^May98| . These works address different 
questions from the one considered here. They are concerned with the decidabil- 
ity and the complexity of behavioral equivalences such as bisimulation |( 'a,ii92l 
IMol96| or model-checking against various propositional temporal logics |May98| . 
Rewriting systems are also used to model parametrized networks of identical pro- 
cesses in !Fnn7| where rewriting techniques are applied for invariant checking, 
but no algorithms for automatic computation of the closure of languages by 
rewriting systems are provided. 

Finally, we have considered in this paper only rewriting systems on words. 
The approach we present can also be extended to rewriting systems on other 
structures such as trees, rings, grids, and graphs in general, in order to deal 
with wider class es of system s. Let us mention some of the few existing results 
on this topic. In jKMM+97) . an extension of the regular model-checking frame- 
work to the case of tree languages is proposed in order to verify parametrized 
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networks with a tree-like topology. However, this paper does not provide accel- 
eration techniques for reachability analysis. In lESEq, tree automata are used to 
characterize reachability sets (set of terms) for a class of processes with parallel 
and sequential composition which subsumes the class of context-free processes. 
Finally, we show in that Theorem El about closure under iterated 

semi-commutation rewriting can be generalized to the case of rings (circular 
words) . 
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Abstract. According to the viewpoint model of software systems devel- 
opment abstract models of different views of the systems are constructed. 
This separation of concerns reduces the complexity of the development, 
but prompts the question for their integration, i.e., the conception of 
a collection of heterogeneous models as a complete specification of a 
system. The integration can be achieved by using a common semantic 
domain for the interpretation of all models, where each viewpoint model, 
due to its partiality, admits a set of possible interpretations. In this pa- 
per such an integrating semantic domain is sketched and an application 
to structure and behaviour models of the Unified Modeling Language is 
discussed. 



1 Introduction 

The viewpoint model of software systems development that is nowadays accepted 
as a predominant approach to rational systems development comprises two main 
features. First, the development process should be based on models. That means, 
abstract representations of the functionality and behaviour of the system are 
provided from the very beginning and maintained during the whole life cycle 
of the system. This yields a concise documentation of the decisions taken in 
each step of the design and reduces the complexity of the development and 
maintenance of the system by abstraction from details. Second, these models 
should not represent the system in its entirety, but focus on specific aspects, 
like the static structure of a component or element of the system, its internal 
behaviour or cooperation with other components or elements of the system, 
etc. This distinction of viewpoints contributes to the separation of concerns in 
the development of the system and thus yields a reduction of its complexity 
orthogonal to the contribution of the modelling. 

The viewpoint model, introduced in the Reference Model of Open Distributed 
Processing RM-ODP [2til‘/!.'-i| is most prominently realized by the different dia- 
gram languages provided by the Unified Modeling Language UML PE3> ^bat 
has become the de-facto standard in object oriented modelling. It supports the 
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basic distinction of structure and behaviour mentioned above, and adds further 
modelling languages for the implementation stage. 

The separation of different viewpoints and their (independent) modelling, 
however, immediately prompts the question for their interrelations. On the one 
hand, each viewpoint model in itself only partially specifies the system, due to 
its focus on one aspect. This implies that the system is underspecified by each 
of the viewpoint models. In order to derive the complete specification of the 
system from the viewpoint models they have to be conceptually integrated and 
correspondences of the different models have to be established. On the other 
hand, the viewpoints are not completely orthogonal to each other. That means, 
certain aspects of the system will be specified by more than one model. Thus 
the consistency of the collection of viewpoint models has to be checked. This 
becomes even harder since the same aspect is specified in very different ways, 
using (paradigmatically) different languages. Thus even a formal definition of 
consistency is not obvious. Since by definition the languages will have different 
semantic domains one cannot request the existence of a common model as a 
criterion for consistency. (See for instance 0 for a discussion in the context of 
RM-ODP.) 

In the UML the integration of the different models is addressed by the meta- 
modelling approach. For the definition of the languages a meta model is given 
that is itself a UML class diagram with constraints. The instances of the meta 
model are the well-formed models of the UML. Whenever model elements are 
instances of the same meta model element they may establish a correspondence 
between the models in the sense discussed above. Beyond the problem of self 
reference, however, i.e., defining new constructs in terms of yet undefined con- 
structs, it is obvious that this approach addresses only the syntactic level. (The 
static semantics given by the well-formedness rules is a precondition for the 
semantics definition, but not the semantics definition itself.) In particular, con- 
sistency can hardly be defined or checked based on this description. 

An alternative approach uses an independent semantic domain — an internal 
system model or a reference model — where all models can be interpreted. Ob- 
viously, such a domain must be sufficiently expressive to support the semantic 
interpretation of the different languages. Moreover, it must support composi- 
tion operations (representing object collaboration by composition of their state 
machines for instance) and refinement or other development operations for the 
iterative and traceable development of concrete design models from abstract 
requirements specifications. Using a common semantic domain for the interpre- 
tation of all models the definitions of consistency and correspondence of collec- 
tions of viewpoint models are supported immediately. Basically, consistency can 
now be reconstructed as ‘having a common semantic interpretation’. However, 
some transformations might be required to achieve this, again due to the dif- 
ferent viewpoints. A class diagram for instance specifies collections of objects, 
whereas a state machine diagram specifies single objects. Thus appropriate em- 
beddings or projections are required that must be provided by the semantic 
domain. Furthermore, in a state machine events and actions may occur within 
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one step, whereas in a sequence diagram send actions and receipt events are 
different steps. Accordingly, refining or abstracting operations or relations must 
be provided. 

An integrating semantics for all models requires to lift the interpretation of 
all of them to the level of full system specifications. That means, on the one hand, 
that an interpretation of a class diagram for instance that specifies the static 
structure is given by system models that also incorporate dynamic behaviour. 
Analogously, a state machine diagram that specifies dynamic behaviour must be 
interpreted with the structural aspects of data states and operation instances 
etc. On the other hand, the partiality of viewpoint models implies that a model 
will never represent a unique full system in this sense. Rather, the missing (com- 
plementary) information may be interpreted arbitrarily, which yields a set of 
interpretations that is admissible from the given point of view. Integration then 
consists of relating these sets of locally admissible interpretations of the single 
viewpoint models by the appropriate transformations. Their intersection then 
yields information about their consistency (non-empty intersection), correspon- 
dences (identical images of specification elements in common interpretations), 
and global under-specification (more than one model in the intersection). 

In this paper a semantic approach is sketched that supports these aims. It is 
based on transformation systems that are transition systems where both states 
and transitions are labelled by constructs representing the corresponding struc- 
tural parts. These are data states (attribute values, etc.) and action structures 
(event and action instances, sequences of actions, etc.) for the states and transi- 
tions respectively. Composition operations and development relations are defined 
for transformation systems corresponding to the requirements on an integrating 
semantic domain discussed above. Then the interpretation of class diagrams, 
state machine diagrams, and sequence diagrams and their integration is dis- 
cussed. The corresponding UML definitions are used, but due to the expository 
character of this paper only very few concepts are considered of course. More- 
over, full formal semantics are not aimed at, nor shall completely new semantics 
be defined in this approach. The idea is rather to use existing (formal) semantics 
definitions as far as possible and reconstruct them in the integrating approach 
presented here. The discussion on the semantics of the UML for instance is still 
quite open, and there are lots of efforts to define precise semantics for specific 
modelling languages of the UML. Quite a few approaches on the other hand 
address an integrating semantics for all languages. In m an approach based 
on stream processing functions is introduced, stressing also the set semantics 
of viewpoint specifications. The approach presented in is based on algebras 
as system models. Both are to a certain extent biased in that either behaviour 
or structure are stressed, whereas the complementary parts have to be added 
or encoded into the offered constructs. One of the ideas of the transformation 
system approach is to reconcile these approaches. 

A bridge between the meta-modelling approaches and the se- 
mantic approaches is built by the pUML group at present (see 
http://www.cs.york.ac.uk/puml/), in particular via its activity in the meta 
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modelling language development (see |H|). Precise semantics both for individual 
languages and the UML as a whole obtained by using other formal methods or 
general theoretical investigations are incorporated into the realm of the UML by 
meta-modelling them. In m for instance the formal definition of generalization 
defined in [3 via a transformation of UML models to Z specifications is reflected 
in a meta model extension. As mentioned above, further research is necessary 
to obtain corresponding results especially for the dynamic models of the UML. 

The paper is organized as follows. In the next section transformation systems, 
their composition operations and development relations are introduced (see IH 
EOl for more detailed presentations). The semantics for class diagrams and state 
machine diagrams are discussed in Sect. |3 and Sect. 0 respectively by defining 
the corresponding sets of admissible interpretations. For the latter also compo- 
sition is discussed, i.e., the construction of systems from objects. An analogous 
semantic investigation of sequence diagrams is sketched in Sect. El Section El 
concludes. 

Acknowledgments. Thanks to the members of the project lOSIP at the TU 
Berlin, especially Daniel Parnitzke, Jenni Tenzer, Aliki Tsiolakis, and Mesut 
Ozhan. 

2 Transformation Systems 

Transformation systems are used as formal mathematical models for the repre- 
sentation of single dynamic entities. These can be whole systems, subsystems, 
objects, threads, methods, etc., which means that the granularity of the model 
is not prescribed by the semantic domain. Rather, the syntactic entities, i.e., 
the specifications according to their specific modelling techniques determine the 
scopes and granularities of the models. 

In their formal structure transformation systems reflect the very general dual- 
ity of dynamic behaviour and static structure. Basically, a transformation system 
is an extended labelled transition system, where both states and transitions are 
labelled. That means, operational models are used as first semantic level where 
an integration is aspired. In contrast with the denotational semantics introduced 
in m this yields a more intuitive representation with explicit modelling of the 
structural aspects, too. The behavioural part of a transformation system is rep- 
resented by an unlabelled transition system, i.e. a transition graph, given by 
sets of control states and transitions. These are abstractions, i.e., they are just 
elements of a set whose construction does not matter. Control states model the 
possibility of state inspections and synchronization points. Transitions model 
the atomic, non-interruptible steps of the entity, which can also be used for the 
synchronization with other systems in the sense of a synchronous execution of 
their steps. 

The transition system is labelled then by data states and action structures 
for control states and transitions respectively, representing their corresponding 
internal structures. It is important to note that these labels are not simply 
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given by some set, as in usual labelled transition systems. Instead, appropriate 
presentation means like signatures for the data states are offered to represent 
all structural aspects in the labels, both for states and for transitions. Thereby 
also languages are given to state properties of the data states. These enriched 
labels yield the required flexibility to focus on the behaviour or the structure 
of some entity, depending on the concerned modelling technique that is to be 
interpreted. 

2.1 Data States, Transformations, and Data Spaces 

The general definition of transformation systems is generic w.r.t. the concrete 
formal structures used to represent data states and action structures. In the 
most simple case a data state is conceived as a list of values for some given list 
of attributes (of an object) or program variables (of a procedure or a program). 
The signature that is the precondition for this definition is then given by a list 
of typed syntactic entities (attributes, program variables), where a fixed set of 
types is assumed to be given. This implicitly also yields the language for the 
formulation of properties of data states. 

Considering partial algebras of a given algebraic signature as data states 
instead of lists of values allows the representation of further structural aspects. 
For instance, built-in or user defined static data types with their operations can 
be made explicit, mutable sets representing creation and deletion of elements, 
or parameterized attributes like arrays or abstract queries as promoted in m 
can be used. The algebraic signature also yields terms and equations or logical 
formulas to denote elements and state properties of these data states. Within 
the signature static parts like the built-in data types like integers and strings 
can be designated. Non-static, mutable constants yield the syntactic entities 
corresponding to attributes or program variables as in the list of values-data 
states above. The interpretation of these constants in some algebra representing 
a specific data state then yields the actual value of the attribute in this state. 

Finally, arbitrary other structures can be used s data states, which is made 
precise by considering institutions as abstract logical frameworks for data 
states. An institution provides signatures, and sets of sentences as well as models 
classes (or categories) for the signatures. The latter are related by satisfaction 
relations \=s that define whether a model M of some signature S satisfies a 
sentence (/?, denoted M (p. Algebraic signatures with total or partial al- 
gebras as models and equations or conditional equations as sentences with the 
usual definition of satisfaction yield institutions for instance. Other examples are 
the above mentioned lists of values for typed syntactic entities as signatures, or 
other logical frameworks with signatures, sentences (formulas), models (struc- 
tures), and satisfaction relations. In the context of semantics for object-oriented 
models system snapshots are used as data state models of appropriate signatures, 
representing sets of objects and their links (see Sect. Oj). 

Corresponding to the data states different action structures can be used as 
transition labels in a transformation system, with appropriate signatures, too. 
Often single actions are considered, either atomic ones given by some set or 
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parameterized actions like operation calls. In the latter case the signature in- 
troduces the names of the operations and their parameter type lists. Usually 
also an invisible action (often called e or r as in process calculi) is considered 
to represent internal non-observable steps of the entity. If parallel execution of 
actions shall be modelled within single steps sets of actions can be used, where 
the empty set would then correspond to the internal action. Another encapsu- 
lation is achieved by using strings of actions, representing the sequential but 
non-interruptible sequence of actions within one step. This is particularly im- 
portant in refinements, when an abstract representation of some computation 
step is refined by a sequence of steps in a more concrete model. Other modelling 
techniques (like statecharts for instance) use further structure to decorate tran- 
sitions, given by triples of events, guards, and action sequences. The duality of 
events and actions yields the basis for the composition of such models (cf. the 
discussion in Sect. E|), analogous to the duality of input and output actions in 
process calculi like CCS psi- 

Data states and action structures together constitute the data space of a 
transformation system, representing the complete space of possible states and 
state changes. Therein the conjoined labels of a transition t : c — >■ d of control 
states c and d yield a data state transformation T : C ^ D, given by a commenc- 
ing data state C (the data state label of c), an action structure T (the label of 
t), and an ending data state D (the label of d). The transition t : c ^ d together 
with this transformation T : C ^ D is considered as a step of the system. 

2.2 Morphisms and Development Relations 

Transformation systems can be related by appropriate morphisms, i.e., structure 
preserving mappings. Following their formal structure such a morphism is given 
by a graph homomorphism to relate the transition graphs and a forgetful func- 
tor relating the data spaces. The latter is thereby induced by a corresponding 
morphism of the data space signatures. These two mappings must be compatible 
with each other in the sense that the labels are preserved up to restriction w.r.t. 
the forgeful functor. 

These morphism can now also be used to define development relations of 
software models that are formally represented as transformation systems. An in- 
jective morphism (with appropriate side conditions) S — >■ S' can be interpreted 
as a reduction of S' by S. The reducing system S may be for instance more 
deterministic, i.e., closer to an implementation, and have a finer internal struc- 
ture (additional private variables or attributes for instance). Composing the two 
morphisms (transition graph homomorphism and data space forgetful functor) 
in opposite directions an extension can be modelled. The extending system of- 
fers further behaviour but preserves the behaviour of the given system, as in an 
inheritance relation (as discussed in nq for instance). Finally, closure operations 
that yield, for instance, sequences of steps as atomic steps again can be used in 
combination with the extension relation to define (sequential) refinements. 
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2.3 Composition of Transformation Systems 

The definition of the composition of transformation systems as formal models 
of local system components comprises two tasks. First the connection of the 
components has to be defined, i.e., the architecture of the system is specified. 
Second, the result of the application of the composition operation to the inter- 
connected components has to be defined. That means, a single transformation 
system must be given that represents a global view on the composed system. 
Abstracting thus from the internal architecture of the system structural trans- 
parency is supported. 

A connection relation for two transformation systems is defined by an identi- 
fication relation for their structures and a synchronization relation for their be- 
haviours. The former states which parts of the data states and actions are shared 
by the two systems. Shared (pervasive) static data types, shared variables, and 
commonly executed actions (handshakes, event/action synchronizations, etc.) 
are specified here. The synchronization relation states which control states are 
compatible with each other in the sense that they can be entered by the two 
components at the same time forming one global state. This contains a consis- 
tency condition of synchronization and identification relation: synchronous con- 
trol states must have identical shared data parts. The synchronization of steps is 
represented by the synchronization relation on the transitions. Again, this must 
be compatible with the transformational behaviour of the synchronized steps in 
the sense that shared parts are transformed in the same way. 

The global view of such a connection is then given as follows. The transition 
graph is given by all synchronous pairs of control states and transitions of the two 
components respectively. That means, it is a subgraph of the cartesian product 
of the two local transition graphs. The signature of the global data space is 
given by the disjoint union of the local signatures, factorized by the congruence 
generated by the identification relation. Correspondingly, a global data space 
of a control state (ci,C2) is given by the amalgamation (cf. [E]) of their local 
data states. That means, each constant (value), function, and carrier set (type) is 
taken from the local data state that provides it. If both contain the corresponding 
item due to the sharing expressed by the identification relation the consistency 
condition ensures that these coincide, i.e., the amalgamation is well defined. As 
global action structure for a transition (^1,^2) the union of the local ones is used, 
taking into account the identification of actions according to the identification 
relation. 

To specify more general architectures, given by arbitrary numbers of com- 
ponents and connections, the categorical structure of transformation systems 
and morphisms is used. In fact, each connection relation as described above 
yields a cospan of morphisms of transformation systems. (Similar spans — called 
channels — of formal system models are used in to describe architectures 

and superposition. The span-cospan duality is due to the fact that specifications 
are considered as opposed to models.) The global view of a composed system 
is then given by the pullback of the cospan, which also comprises the projec- 
tions of the global view to the local components. The general mechanism for 
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the composition of transformation systems is accordingly given by representing 
the architecture by a diagram of transformation systems, specifying the compo- 
nents and their interconnections. The limit of such a diagram then represents 
the global view and its projections to the local components as above. 

3 Class Diagram Semantics 

After this exposition of the semantic domain we can now discuss the interpreta- 
tion of software system models. 

A class diagram specifies the structure of a collection of objects. That means, 
the attributes and the operations of the objects belonging to each of the classes 
are introduced as well as the structural relationships of objects of different 
classes. The latter are specified by inheritance relations and associations, in- 
cluding aggregation and composition. The behaviour of the objects is in general 
not constrained by a class diagram. However, object constraints can be added 
that have an impact on the behaviour, too, for instance as pre and post condi- 
tions of methods. Inside the UML these are formulated in the object constraints 
language OCL 

In Fig. n a class diagram for objects of a network protocol is shown. (The 
example is copied from the DHCP protocol m and its UML model in |3D1-) 
Servers, clients and IP addresses are introduced. A client can discover a server 
and request an IP address for its connection to the network from the server, pro- 
vided the latter has offered it. The request is responded by an acknowledgment 
(ack) or not (nak), depending on the actual availability of the address. 

To define the formal semantics of a class diagram — in the sense discussed in 
the introduction — the set of admissible interpretations as transformation systems 
has to be given. Since class diagrams focus on the static structure we start with 
the discussion of the data states and action structures and their signature. 

Each single class C in the class diagram yields a data space signature EC as 
follows. The class name and each type or class used for an attribute or an oper- 
ation yield a sort name. The class sorts will be interpreted as sets of references 
to objects of these classes in the data states of the objects. Each attribute of 
the class C is translated to a constant of the corresponding sort in EC, and for 
each operation op with return type t a constant ret.op : t is introduced. This 
is used in the data states of the active object to represent the return value of 
the operation in the state after its execution. Finally, for each operation op with 
parameter type list ti, . . . , an action name op : ti, . . . , is introduced into 
the action signature of EC. 

An association as between classes C and D with role names myD and myC 
respectively (like the aServer - aClient association in Fig. yields a further 
data space signature Eas with two sorts C and D and a predicate a : C,D. The 
action signature is empty. In each system state this models the links between 
objects of classes C and D. Furthermore constants myD : set{D) and myC : 
set{C) are added to the signatures EC and ED respectively, if navigation in the 
corresponding direction is possible. These represent the references to objects that 
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Fig. 1. Class diagram of a network protocol 



are linked to the concerned object via the association as. Finally an inheritance 
relation of classes Super and Sub is mapped by adding the items of the signature 
S Super to ESub and relating the two by a signature inclusion morphism. 

In this way the whole class diagram is translated into a diagram of data 
space signatures (for details see ^H])- Such a diagram is considered as the data 
space signature for the transformation systems that constitute the formal se- 
mantics of the class diagram. W.r.t. this signature now also the data states and 
transformations of an admissible transformation system are discussed. 

A data state represents the state of the whole set of objects in the system at 
a certain point in time. Thus the following information has to be given. 

— How many objects are there for each class? 

— What is the state of each object, i.e., what are its attribute values and which 
object references does it maintain? 

— How are the objects linked? 

Formally a system snapshot is given by a tuple ref) defined as follows. 

— I = (Ic)c^c is a family of sets Ic indexed over the set C of classes in the 
class diagram. The elements of Ic are indexes to the objects of class C , that 
can be considered as abstract object addresses. 

— A = (Af)i^j^^ceC is a family of partial algebras Ap of signature EC for 
each object index i £ Ic and class C £ C. The algebra Ap represents the 
actual state of the object i of class C by the values of its attributes, its sets 
of references, etc. The association i Ap yields the object state associated 
to the index (address) i. 
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— For each association as a partial algebra L of signature Sas is given, whose 
carrier sets are given by the object index sets Ic and Id - Thus L is basically 
given by a relation on these object index sets, representing the links of objects 
in the actual state. 

— For each algebra Af in A and each class sort D that occurs in EC a partial 

mapping ref^D ■ is given, collected in the family ref. This 

represents the values of the object references the object i maintains internally 
in that it yields an object index (address) reffjj{r) G Id of the right type 
for each object reference r inside i. Thereby it is required moreover that each 
object has a reference to itself, i.e., for each i £ Ic there is a distinguished 
reference self £ Af(j with reffci^elf) = i- 

Note that the internal object references are elements of the algebra Af, 
whereas the object indexes obtained by the re/ -functions reside outside Af . 

A system snapshot conforming to the class diagram shown in Fig. [D is shown 
in Fig. 0 Two S'errer-objects, two IPAddress-ohjects, and one Client-ohject 




Fig. 2. Object configuration for the network protocol 
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are shown. The corresponding index sets are Iserver = {S’!, 5'2}, IipAddress = 
{IPl, IP2}, and Icuent = {Cl} respectively. The graphical presentation of an 
object state as partial algebra is explained in Fig. 0 that shows the Client- 
signature and object Cl as its instance. The association states are depicted 
in Fig. 0by the algebras (relations) LI and L2 (the carrier sets are omitted, 
since they are given by the index sets). The re/-functions realizing the links by 
dereferencing are indicated by the arrows from the reference sets to the objects. 
Note that the link of Cl and IP2 is not supported by an association, but by the 
attribute mylPAddress : IPAddress. 
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Fig. 3. State of the Client-object Cl as partial HClient-algebra 



The ingredients of a system snapshot have to obey the following consistency 
conditions. 

— For each pair in sui association algebra L there must be a reference 

r in the object Af such that re/f (r) = i' if navigation is possible in this 
direction, and vice versa. 

— If i G Isuh for some class Sub which is a subclass of a class Super then also 

* G I Super and the reduct of Af"^^ to the smaller signature E Super coincides 
with That means, each object of the subclass can be considered also 

as an object of the superclass, via the same index in another index set. 

— Association multiplicities must be respected and for each (component) object 
of a class with a composition association to a composite class there must be 
a composite object it belongs to. 

This defines all admissible data states of a transformation system for the class 
diagram. 

An action structure for a transformation step from one system snapshot to 
another one is given by a set of operation calls. Thus parallel actions are used. 
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corresponding to the possible concurrency of the objects in the system. However, 
each operation call must be decorated with a target that allows to determine the 
object that executes the operation. For that purpose the object indexes are em- 
ployed, using the common dot notation i.op{pi, . . . ,pn) to indicate that object i 
is called to execute its operation op with the actual parameters pi, ■ ■ ■ ,Pn- With 
system snapshots as data states and sets of directed operation calls as action 
structures a class diagram yields a data space, i.e., a space of possible system 
states and state transformations. Since there is no behaviour specification the set 
of admissible interpretations of the class diagram is given by all transformation 
systems with this data space. That means, arbitrary transition graphs can be 
chosen with labels in this data space. 

4 State Machine Semantics 

In this section we consider a UML modelling technique for the intra-object be- 
haviour. State machines specify how objects of a given class react to operation 
calls (or other events in their environment). 

Basically, a state machine is a Mealy automaton with events as input and 
actions as output. Its transitions may be constrained in addition by guards, i.e., 
boolean expressions specifying conditions on the data state of the object w.r.t. 
the parameters of the operation call. State machines may use several further 
features, like hierarchical states, parallel regions, history markers etc. Their se- 
mantics has been defined in several papers by reducing them to some kind of 
labelled transition system (see for instance 1^41141101 1. corresponding to the dif- 
ferent variants of state charts and state machines (see m for a survey). Thus 
as a starting point for the definition of the transformation system semantics we 
may assume a representation of a first level operational semantics in terms of 
appropriate Mealy automata (with guarded transitions) already. 

The basic idea of the interpretation of state machines by transformation 
systems is to add the data states and action structures in all possible ways, 
dual to the interpretation of class diagrams. Although actions are used in state 
machines their effect on the data states of the objects is not specified. That 
means, a state machine refers to data states and data state transformations via 
the actions and the guards, but it does not specify them. This gives rise to a set 
of admissible interpretations again. 

The first step in the construction of a transformation system for a state 
machine is to add data states D to the state machine states s, i.e., to build pairs 
(s, D). Such a data state D = (A, /, ref) is given by 

— an algebra A representing the state of the concerned object, with signature 
EC induced by the corresponding class, 

— a family I = {Ic)ceC of object indexes as in a system snapshot; these may 
be used as object parameters in events and actions, 

— re/-functions as above, that point to the object indexes associated by the 
object to its internal references. 
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Note that references can be dereferenced, but the object at this address cannot 
be accessed. That means, D represents the state of one object, including a part 
of the surrounding system structure, but not the whole system as in a system 
snapshot. Thus D can be considered as an incomplete system snapshot with 
only one object and several unbound object indexes. The self -reference, its cor- 
responding index, and the association with the object state A thereby represent 
state machine instances. That means, different data states can be defined using 
different object index sets that then represent different instances of the same 
state machine, distinguished by the indexes. 

The data state as data space label of a control state (s, D) within a transfor- 
mation system for the state machine is given by the projection to D. (A similar 
construction of states as pairs of control states and data states has been used in 
the abstract object model introduced in 

The second step of the construction of admissible transformation system 
interpretations of state machines consists in the definition of the transitions. 
For that purpose consider first a transition e[g]/a : si — >■ S2 in the state 
machine with event/operation call e = op{x\, . . . ,Xn) with formal parame- 
ters x\,. . . ,Xn, guard/boolean expression g, and a synchronous operation call 
a = op{pi, . . . ,pn) to another object as action, with actual or formal parameters 
Pi,-- - ,Pn- Then 

— for each data state Di that satisfies those conditions in g that only refer to 
the given object {self), 

— and each data state D2 

there may be a transition e'^[g~]/ : {si,Di) — >■ {s2,Df) in the transition 
graph of a transformation system for the state machine, where 

— e“*" = op{a\, . . . , an) is any instantiation of e by actual parameters, 

— 5“ is a corresponding instantiation of the remaining conditions of g that 
have not yet been evaluated, i.e., the ones that refer to other objects via 
navigation, 

— is the instantiation of a according to the one of e^. 

The set of event instances e’*' represents the possible operation call instances 
as non-determinism in the transition graph. The actual calls will then be se- 
lected by other objects from this set. This selection is technically obtained by 
the composition of the corresponding state machines resp. the corresponding 
transformation system. The technique of representing input actions/events as 
sets of choices and communication as selection from this choice is adapted from 
process calculi like CCS and LOTOS jS]. 

The effect of the execution of op{ai ,.. . , a„) on the data state of the object is 
represented by the data states D2 - Since the operation semantics is not specified 
in the state machine any data state D2 is admissible. The operation may be 
non-deterministic, represented by transitions with the same event instance 
to different output states. For completeness it is required that in each admissible 
transformation system there must be a transition for each event instance that 
satisfies the guard. 
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In addition to the transitions induced by the transitions of the state machine 
a transformation system for the state machine should have formal idle transitions 
(s, D) — )> (s, D) for each state (s, D). The idle transitions allow the definition of 
global system steps as tuples of steps of the local components. Components not 
taking part in a global step formally perform an idle step then (cf. Sect. 14 . Ill . 

The action structure of the transition e+[(;“]/a+ : (si,Di) (s2,£^2) is 
given by the whole triple e^[g~]/ , whereby the incoming each operation callis 
prefixed by the index of the active object. Each event thus obtains the index of 
the object instantiating the state machine and executing the incoming operation 
calls. The complete information in the action structure is needed later for the 
composition with other transformation systems for state machines. The action 
structure of an idle transition is the internal action (empty set or r) . 

For a transition e[g\/a\\ . . . ; Ofc : si — > S2 in the state machine with a se- 
quence of asynchronous operation calls ai; . . . ; intermediate states are intro- 
duced to split the local operation execution (e) from the subsequent calls to 
other objects (see Fig. EJ. The latter cannot change the data state of the given 
object, due to encapsulation, since they do not belong to the operations of the 
class. 
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Fig. 4. Synchronous and asynchronous actions with data space attachments 



Consider as example the state machines of the Server and IPAddress classes 
shown in Fig. E| that are already Mealy automata. The names of the transitions 
represent the different behaviours of the public operations of these classes, de- 
pending on the control states of the objects and guards checking the parameters. 
The transition requestl from state hf = Has free IPAddresses to hf for instance 
has the label 

request (reqlP, client, server)[{server = self) and {reqlP .getBinding = false)] / 
reqIP.bind{); client. ack{reqIP , self) . 

(The complete state machine as well as the one for the Client-class can be found 
in m)- Transitions in a transformation system for the Server-state machine 
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Fig. 5. State machines for Server and IPAddress objects 



corresponding to the requesti transition are shown in the central column of 
Fig. El Thereby the local data state Ds2 is given by the state of the object S 2 as 
shown in Fig.^ with the object index sets {Sl,S 2 },{IPl,IP 2 },{Cl} and also 
the re/-functions for S 2 as in Fig. El Note that in this example the operation 
request does not change the state of the server, but only triggers further actions. 
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Fig. 6. Synchronous steps of IPAddress, Server, and Client objects 



4.1 Composition of State Machines 

The collaboration of objects whose behaviour is specified by state machines is 
induced by the mutual operation calls, modelled as events and actions. Seman- 
tically the corresponding composition of the state machines can be represented 
by the composition of transformation systems as discussed in Sect. VZ.',4 For that 
purpose an identification and a synchronization relation must be derived, which 
then yields the interconnection of the transformation systems as well as a global 
view of the common behaviour of the composition as a single object. 

Consider for that purpose the steps of the IPAddress, Server, and Client- 
objects shown in Fig. El With the identification relation the sharing of pervasive 
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static data types like Strings and Booleans is expressed, i.e., they are assumed 
to be identical for all objects. There are no shared attributes or actions, corre- 
sponding to the object paradigm. 

States {s,D) and {s',D') are synchronous if the object index sets I and /' 
in the data states D and D' coincide, i.e., each object refers to the same system 
environment. (Furthermore the interpretations of the shared static data types 
in the objects must coincide.) Since each object has a reference to itself the 
corresponding index i is already associated with the object state Ai. Together 
with the union of the re/ -functions given for each object in its data state D this 
yields the complete set of links in the composed system. 

Transitions are synchronous if their commencing and ending states are syn- 
chronous and one contains an action a = i.op{pi, . . . ,pn) and the other one 
contains the complementary event e = i.op{pi, . . . ,Pn)- That means, the call 
action is synchronized with the receipt of the call. If the operation call is syn- 
chronous this means that the induced actions of the call have to be performed 
in the same step and the calling object has to wait until these are finished. In 
an asynchronous operation call the calling object just delivers its call to finish 
the step. The actions take place in the subsequent steps. Beyond these synchro- 
nizations the idle transitions are synchronous with all those transitions whose 
commencing and ending states are synchronous with the state of the idle tran- 
sition. In Fig. El steps of the IPAddress, Server, and Client-objects IP2, S2, 
and Cl are shown. In state D'jp 2 the value of the bound attribute is false, after 
the execution of the &md-operation in state D/p 2 it is true. In the initial state 
Dci of the client the attributes mylPAddress and myServer are yet undefined. 
It sends a request to server S2 (initiated by an offer not shown in this cut) , waits 
for an acknowledgment, and updates its attributes accordingly. The states I?/p 2 , 
Ds 2 , and Dci are the incomplete system snapshots corresponding to the objects 
IP2, S2, and Cl as shown in Fig. Q According to the definition given above all 
transitions on the same horizontal layer are thus synchronous with each other. 

This composition and synchronization of state machines also allows a com- 
parison with the class semantics. Both sets of admissible interpretations are now 
sets of transformation systems of the same data space (signature) except that 
the action structures of the composed state machine transformation system still 
contain the communication information (guards and actions). If a composition 
is considered as complete and shall be integrated with the class semantics the 
additional composition information has to be hidden, i.e., the labels have to 
be projected to the event components. The intersection of the class diagram 
and composed state machine semantics then yields the interpretations that are 
admissible from both points of view. This excludes for instance all class interpre- 
tations that do not offer the operations in the sequences specified by the state 
machines, and all (composed) state machine interpretations where a client has 
more than one server or IP-addresses are not bound to servers. These constraints 
are specified in the class diagram but not in the state machine diagram. 

Finally, an abstraction of composed systems given by object collaborations 
via state machine compositions or as class diagram semantics can be given, based 
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on the composition operation for transformation systems. In the global view, i.e., 
the result of the composition operation, the composed system of Fig. Ellooks like 
a single object. Its class (structure) is given by the union of the three classes 
Server, IPAddress and Client (i.e., the union of their attributes, associations 
represented as attributes, and their operations). Each state of the global ob- 
ject is given by the amalgamation of the states of the local objects, including 
the internal object index sets as a further sort. Its actions (operation execu- 
tions) are given by the union of the local actions. According to the internal- 
ization of the structure the active objects are no longer visible, i.e., the object 
indexes are hidden. The part of the network behaviour of the object composi- 
tion S2, IP2, Cl for example is thus given by the sequential operation executions 
request{IP2, Cl, S2); bind; ack{IP2, S2). 

5 Sequence Diagram Semantics 

Sequence and collaboration diagrams in the UML specify the interaction of ob- 
jects, i.e., the inter-object behaviour, via the exchange of messages. Sequence 
diagrams graphically stress the temporal order, whereas collaboration diagrams 
show the object structure corresponding to the class diagram. Semantically they 
are equivalent. They can be integrated into the transformation system framework 
easily, following the constructions and interpretations for collections of objects 
in the previous sections. 

As starting point for the definition of the set of admissible interpretations we 
assume again a transition system semantics, as given for instance in for 

life sequence charts that are a conservative extension of sequence diagrams. 

For the construction of the transformation system semantics consider for 
example the sequence diagram in Fig. 0 This yields system snapshots and 
transformation steps as follows. The object instances yield object index sets 
Iciient = {client} and Iserver = {scrvcr}. Arbitrary data states for the ob- 
jects, i.e., E Client and AS’erwer-algebras, and links (re/-functions) supporting 
the message exchanges, can be added. These are not constrained by the se- 
quence diagram. Each receipt of a message (=operation call) indicates a step 
of the system, given by the execution of the corresponding action. (For sake 
of brevity a sequential order of messages with asynchronous operation calls 
and message transfer is considered as example here.) Analogous to the con- 
struction of state machine transformation systems these transitions can be cou- 
pled with arbitrary data state transformations. The corresponding action la- 
bels are given by the messages of the sequence diagram, prefixed by the object 
instance receiving the message. We obtain thus a sequence of transformation 
steps client.connect{); server.discover{client); server .getFreel P{); . . . describ- 
ing a specific view of the system. 

For an integration of the set of admissible interpretations of sequence dia- 
grams obtained in this way with the class diagram semantics the object instance 
names have to be unified. For example, the instance names client and server 
used in the sequence diagram should correspond to Cl and S2 chosen in an- 
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Fig. 7. A sequence diagram for the network protocol 



other model. This is supported by the definition of system snapshots in the class 
diagram transformation systems that may comprise arbitrary object index sets. 
In this way correspondences between models developed independently of each 
other (by different persons at different sites) for example can be established. 

To compare sequence diagrams with state machine diagrams and check their 
consistency an appropriate composition of state machine transformation systems 
has to be considered, corresponding to the instances defined in the sequence 
diagram. Vice versa, projections from the sequence diagram transformation sys- 
tems to the single state machine transformation systems can be defined. The 
existence of a projection then proves that the sequence diagram is consistent 
with the state machines (see |271 for a more detailed discussion). That means, 
the scenario specified by the sequence diagram conforms to the capabilities of 
the objects as specified by their state machines. 



6 Conclusion 

Transformation systems have been introduced as elements of an integrating se- 
mantics for object oriented system models as given by the different diagrams 
of the Unified Modeling Language for instance. According to the separation of 
concerns realized by the viewpoint models each one only provides partial infor- 
mation about the system. Semantically this under-specification is reflected in 
defining sets of admissible interpretations for each model. These are given by 
transformation systems as formal mathematical representations of whole sys- 
tems, incorporating their structure, behaviour, and internal interaction. The 
composition operations for transformation systems support the composition of 
components or subsystems, as for example the connection and collaboration of 
objects as state machine instances, as well as the abstraction from the internal 
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structure. That means, structural transparency is supported in the formal model 
much more than in most software modelling techniques. Furthermore, refinement 
and other development operations and relations have been defined for transfor- 
mation systems. Using this integrating formal semantics these can be employed 
to formulate corresponding refinement relations for software system models as 
rules referring to the syntax of the languages directly. 

The heterogeneity of the models is addressed by using one common semantic 
domain on the one hand, and semantic transformations corresponding to the 
mutual relationships of the models on the other hand. For example, state machine 
transformation systems that represent single object instances can be composed 
to obtain system models. These can then be compared with interpretations of 
sequence diagrams, where the number of objects is fixed. Thereby the structure 
that has been needed for the composition (like the distinction of events and 
actions and guards that may refer to the other components) must be hidden 
by projecting to the relevant parts. Vice versa, the system view taken by class 
diagrams comprises system of several objects and (degenerate) systems of single 
objects, which allows a comparison with the other types of diagrams. 

The semantic interpretation in a common domain supports the definition of 
correspondences between different models that may have been developed inde- 
pendently of each other. At the same time, the intersection of the (appropriately 
transformed) sets of admissible interpretations yields a formal definition of con- 
sistency (corresponding to the correspondences established before). Note that 
due to the formal approach the sets are large in general, since informal mean- 
ings induced by the names used in the models cannot be taken into account. 
To reduce these further specification means may be considered, like object con- 
straints in the UML. The corresponding abstract syntactic means for the repre- 
sentation of properties of transformation systems are introduced in j2()j . where 
also preservation results for development relations and composition operations 
are discussed. 

The aim of these investigations conforms to the efforts of making more pre- 
cise software systems modelling languages as supported by the pUML group 
for instance. Although the style and presentation in the orignal contributions is 
mathematical and thus does not conform to the UML standards, results can be 
achieved that are relevant and could be incorporated in appropriate meta model 
extensions. This could be one way to transfer theoretical results into improved 
software systems development standards. 
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Abstract. Labelled partial orders in concurrency are a natural and pow- 
erful modelling formalism. Recently, there has been a renewed focus on 
such models arising in various areas of applications. We survey some re- 
sults on interesting problems for partial order based models, focussing 
on decidability issues. 



Summary 

Within Concurrency a multitude of models have been suggested and studied, 
for a survey see In the report from the Concurrency Working Group 

from ACM Strategic Directions in Computing Research isncB], these models 
were classified on the basis of the stances they adopt with respect to three basic 
dichotomies: 

— Intensionality versus Extensionality 

— Interleaving versus True Concurrency 

— Branching versus Linear Time 

Most of the successful research within concurrency has been dealing with 
interleaving models, resulting in a great deal of insight and practical tools for 
reasoning about concurrent systems. However, a substantial number of results 
have been obtained also on so-called true concurrency or noninterleaving models. 
Recently, there has been a renewed focus on such models arising in quite differ- 
ent areas of applications. It turns out that for most of these applications, the 
extensional model is formally some version of labelled partial orders (pomset lan- 
guages |P86j l or event structures |WN95j ). whereas the intensional models come 
in a variety of different shapes (following the interpretation of intensionality and 
extensionality from the report mentioned above). 

One such example is Message Sequence Charts mm, a widely used for- 
malism for describing system requirements at an early stage of development. 
MSC’s occur in a number of different software methodologies, and have been 
used extensively in e.g. the development of distributed telecommunication soft- 
ware. Formally, an MSC is a labelled partial order. The elements of the partial 
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order represent events in the form of sends and receipts of messages between 
different sequentially computing agents p andg, indicated by the labelling p\q 
and p7q respectively. The partial order represents the causal order of activity 
induced by the sequential ordering of events of the individual agents, combined 
with the ordering imposed by the sends and receipts of individual messages. The 
most commonly used intensional counterpart of MSC’s is a so-called Message 
Sequence Graph, a finite graph generating a collection of MSC’s. They are a 
powerful formalism with the consequence that many interesting questions on 
their behaviours (MSC’s) become undecidable or untractable, see e.g. [AY fit)] . 
Recently, promising attempts have been made in order to identify a notion of reg- 
ular MSC languages along with an associated set of reasoning tools [IHMKTOOj . 

Another example is recent work on partial order based models for security 
protocols, exemplified by the strand space model from jFHCfiS] . the multiset 
rewriting model from |CDMLS00j . and the Petri net model from jCWOl] . As 
with MSC’s the underlying extensional model used is a version of labelled par- 
tial orders, where the events are again representing the sends and receipts of 
messages amongst sequentially communicating agents. The agents either behave 
according to given security protocols, or play the role of intruders following 
certain (restricted) capabilities with respect to e.g. encryption and decryption. 
Compared with MSC’s, the scenarios are simple, since the authentication and 
secrecy protocols studied are most often of bounded length. However, the la- 
belling is relatively more complex, taking the structure of encrypted messages 
into account, the causal dependency amongst 

In both examples above, the partial orders in modelling typically arise 
in an attempt to represent the causal dependency amongst events of com- 
municating asynchronous sequential agents. The same phenomenon occurs in 
models for VLSI design, where it has been observed that keeping one global 
clock synchronised is usually the bottleneck in a processor design. Hence new 
architectures have been proposed and successfully applied, notably the so- 
called Globally Asynchronous and Locally Synchronous (GALS) architecture 
of [MHKKHLrPfiS] . A formal timed and partial order based model of the GALS 
principles has been studied recently in 



So, labelled partial orders occur in many application areas as a natural mod- 
elling framework, and the reasoning about such models typically involves rea- 
soning about the possible past and future of individual events in a computation, 
with the interpretation that past and future is relative to the causal structure 
of the computation. As indicated above, there is unfortunate evidence from the- 
oretical research on such models that many of the natural questions to ask on 
partial order behaviours (of corresponding intensional models) easily become 
undecidable or untractable. This is the case for typical problems like model 
checking with respect to logics expressing partial order behaviours and 

equivalence checking mm- We illustrate this by a survey of results, including 
the identification of useful special cases which allow algorithmic reasoning. 
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Abstract. Randomized search heuristics like simulated annealing and 
evolutionary algorithms are applied successfully in many different situa- 
tions. However, the theory on these algorithms is still in its infancy. 
Here it is discussed how and why such a theory should be developed. 
Afterwards, some fundamental results on evolutionary algorithms are 
presented in order to show how theoretical results on randomized search 
heuristics can be proved and how they contribute to the understanding 
of evolutionary algorithms. 



1 Introduction 

Research on the design and analysis of efficient algorithms was quite successful 
during the last decades. The very first successful algorithms (Dantzig’s simplex 
algorithm for linear programming and Ford and Fulkerson’s network flow al- 
gorithm) have no good performance guarantee. Later, research was focused on 
polynomial-time algorithms (see Cormen, Leiserson, and Rivest (1990)) and this 
type of research has been extended to approximation algorithms (see Hochbaum 
(1997)) and randomized algorithms (see Motwani and Raghavan (1995)). Indeed, 
designing and implementing an efficient algorithm with a proven performance 
guarantee is the best we can hope for when considering an algorithmic pro- 
blem. This research has led to a long list of efficient problem-specific algorithms. 
Moreover, several paradigms of algorithms have been developed, among them 
divide-and-conquer, dynamic programming, and branch-and-bound. There are 
general techniques to design and analyze algorithms. However, these paradigms 
are successful only if they are realized with problem-specific modules. Besides 
these algorithms also paradigms for the design of heuristic algorithms have been 
developed like randomized local search, simulated annealing, and all types of 
evolutionary algorithms, among them genetic algorithms and evolution strate- 
gies. These are general classes of search heuristics with many free modules and 
parameters. We should distinguish problem-specific applications where we are 
able to choose the modules and parameters knowing properties of the considered 
problem and problem-independent realizations where we design a search heuri- 
stic to solve all problems of a large class of problems. We have to argue why 
one should investigate such a general scenario. One main point is that we obtain 
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the frame of a general search heuristic where some details may be changed in 
problem-specific applications. Moreover, there are at least two situations where 
problem-independent algorithms are of particular interest. First, in many appli- 
cations, one has not enough resources (time, money, specialists,. . . ) to design a 
problem-specific algorithm or problem-specific modules. Second, often we have 
to deal with “unknown” functions which have to be maximized. This scenario is 
called black box optimization. It is appropriate for technical systems with free 
parameters where the behavior of the system cannot be described analytically. 
Then we obtain knowledge about the unknown function only by “sampling”. 
The t-th search point can be chosen according to some probability distribution 
which may depend on the first t— 1 search points xi, . . . , Xt~i and their function 
values /(xi), . . . , /(xt_i). One main idea of all randomized search heuristics is 
to “forget” much of the known information and to make the choice of the pro- 
bability distribution only dependent on the “non-forgotten” search points and 
their /-values. 

Our focus is the maximization of pseudo-boolean functions / : {0, 1}" — >■ R 
which covers the problems from combinatorial optimization. We investigate and 
analyze randomized search heuristics which are designed to behave well on 
“many” of the “important and interesting” pseudo-boolean functions. Obviously, 
they cannot beat problem-specific algorithms and, also obviously, each rando- 
mized search heuristic is inefficient for most of the functions. The problem is to 
identify for a given randomized search heuristic classes of functions which are 
optimized efficiently and to identify typical functions where the heuristic fails. 
Such theoretical results will support the selection of an appropriate search heu- 
ristic in applications. One may also assume (or hope) that the search heuristic 
behaves well on a function which is “similar” to a function from a class where 
it is proved that the heuristic is efficient. Moreover, the proposed results lead 
to a better understanding of search heuristics. This again leads to the design 
of improved search heuristics and gives hints for a better choice of the parame- 
ters of the search heuristic. Finally, analytical results support the teaching of 
randomized search heuristics. 

In black box optimization the black box (or oracle) answers queries x with 
/(x) where / : {0, 1}" — >■ R is the function to be maximized. Since queries 
are expensive, the search cost is defined as the number of queries. For a fixed 
search heuristic let Xf he the random number of queries until “some good event” 
happens. The good event in this paper is that a query point is /-maximal. Then 
we are interested in the expected optimization time E{Xf) and the success 
probability function s{t) := Prob(X/ < t). This is an abstraction from the real 
problem, since obtaining the /-value of some optimal x does not imply that 
we know that x is optimal. In applications, we additionally need good stopping 
rules. 

Our focus is on evolutionary algorithms which have been developed in the 
sixties of the last century and which have found many applications during the 
last ten years. Evolutionary algorithms are described in many monographs (Fogel 
(1995), Goldberg (1989), Holland (1975), Schwefel (1995)) and in a more recent 
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handbook (Back, Fogel, and Michalewicz (1997)). The experimental knowledge 
is immense, but the theory on evolutionary algorithms is still in its infancy. One 
can find several results on the one-step behavior of evolutionary algorithms, but 
these results most often have no implications on the expected optimization time 
or the success probability function. The famous schema theorem belongs to this 
category. There are even more results using simplifying or even unrealistic as- 
sumptions. The building-block hypothesis is such an idealized hypothesis which 
has turned out to be wrong in many realistic scenarios. Another idealized ana- 
lysis works with “infinite populations” . This makes it possible to apply methods 
from statistical dynamics. We claim that it is necessary to develop results on the 
expected optimization time and the success probability function which are not 
based on any assumptions, in particular, for generic variants of evolutionary al- 
gorithms and for “interesting” subclasses of functions. This does not exclude the 
investigation of fundamental problems without direct implications for concrete 
algorithms. The paper of Rabani, Rabinovich, and Sinclair (1995) is exemplary 
for such an approach. 

In the rest of the paper, we elucidate our approach with some results. In 
Section El we introduce the simplest variant of an evolutionary algorithm, the 
so-called (1-1- 1)EA, and in the following three sections we present results on the 
behavior of the (1-1- 1)EA. In Section 0 we investigate monotone polynomials 
of bounded degree and, in Section 0 the special classes of affine functions and 
royal road functions. Section 0 contains an overview of further results on the 
(1 -I- 1)EA and some of its generalizations. In Section 0 we introduce a generic 
genetic algorithm which applies a crossover operator and discuss why it is more 
difficult to analyze evolutionary algorithms with crossover than evolutionary 
algorithms based solely on mutation and selection. Section 0 contains the first 
proof that crossover reduces the expected optimization time for some specific 
function from exponential to polynomial. We finish with some conclusions. 

2 A Simple Evolutionary Algorithm 

We describe the simplest variant of an evolutionary algorithm which works with 
population size 1 and is based solely on selection and mutation. 

Algorithm 1 ((1 + 1)EA). 

1. ) Initialization: The current string x € {0,1}" is chosen randomly using the 

uniform distribution. 

2. ) Selection for mutation: The current string x is chosen. 

3. ) Mutation: The offspring x' of x is created in the following way. The bits x' 

are independent and Prob(a:' = 1 — Xi) = Pm(n) (this parameter is called 
mutation probability). 

4. ) Selection of the next generation: The new current string equals x', if f{x') > 

/(cc), and cc, otherwise. 

5. ) Continue at Step 2 (until some stopping criterion is fulfilled). 
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The generic value of Pm{n) equals 1/n implying that, on average, one bit is 
flipped. Then the number of flipping bits is asymptotically Poisson distributed 
(with parameter 1). The algorithm can easily be generalized to larger population 
size fi. Then Step 2 is not trivial. The number of offsprings can be generalized to 
A. There are many selection schemes for Step 4. The most prominent are (/i + A)- 
selection (the best p of the p parents and the A offsprings are chosen) and {p, A)- 
selection (the best /r of the A offsprings are chosen) . These two selection schemes 
lead to the class of so-called evolution strategies (which have been developed for 
continuous search spaces R"). This explains the notion (1 -|- 1)EA for Algorithm 
n which can be interpreted as evolution strategy with population size 1. Another 
possibility is to interpret Algorithm Q as a randomized hill climber, since it does 
not accept an offspring with a smaller /-value (fitness). A crucial point is that 
each x' G {0, 1}" has a positive probability of being created as an offspring of 
X. Hence, the (1 -I- 1)EA cannot get stuck forever in a non-optimal region. The 
analysis of the (1 -I- 1)EA is interesting, since 

- the (1 -I- 1)EA is for many functions surprisingly efficient, 

- the analysis of the (1 -I- 1)EA reveals many analytical tools for the analysis 
of more general evolutionary algorithms, and 

- the (1 -|- 1)EA can be interpreted as evolutionary algorithm and as randomi- 
zed hill climber. 

The reason for larger populations is that a single search point may randomly 
choose “the wrong way” and may reach a region which makes it difficult to And 
the optimum. Working with a larger population one hopes that not all individuals 
of the current population go into a wrong direction and that some of them And 
the optimum efficiently. However, the individuals are not considered indepen- 
dently. If the individuals “on the wrong way” have during the following steps a 
larger fitness, they may drive out all individuals “on the right way” by selection. 
Hence, it is crucial to have a selection scheme which supports “enough” diversity 
in the population and which nevertheless eliminates bad individuals. Multi-start 
variants of the (1 -I- 1)EA cope in many situations with these problems, since the 
different runs of the (1 -I- 1)EA are independent. Performing m{n) runs, each for 
t{n) steps, leads to a success probability of 1 — (1 — s(t(n)))"*("’\ if s{t{n)) is the 
success probability of a single run of the (1 -I- 1)EA. 

3 The (1 + 1)EA on Monotone Polynomials 

Pseudo-boolean functions / : {0, 1}" — >■ R have a unique representation as poly- 
nomials, i.e., 

fix) = X] WA • 

AC{l,...,n} i&A 

The degree of / is the largest size of a set A where wa yf 0. It is well known 
that the maximization of pseudo-boolean polynomials of degree 2 is NP-hard 
and Wegener and Witt (2001) have explicitly defined a degree-2 polynomial 
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where not only the expected optimization time of the (1 + 1)EA is exponential 
but also multi-start variants fail, since for some c > 0 the success probability 
after steps is Such a function is almost a worst case function 

for the (1-1- 1)EA, since the expected optimization time for each pseudo-boolean 
function is bounded above by n" = 2”*°s". This follows, since the probability 
to produce an optimal string within one step is always lower bounded by n“”. 
We investigate monotone polynomials, i.e., polynomials where all weights wa, 
A 0, are non-negative. The (1-1- 1)EA treats zeros and ones in the same 
way. Therefore, our results also hold for polynomials which are obtained from 
monotone polynomials by replacing some variables Xi with xl = I — Xi. This 
includes all affine, i.e., degree-1 functions. 

Knowing that a pseudo-boolean function is a monotone polynomial, the maxi- 
mization is trivial. The all-one string always is optimal. However our motivation 
is black box optimization and we like to investigate the behavior of the (l-l-l)EA 
on monotone polynomials. This subclass of functions is interesting, since we can 
investigate the expected run time with respect to natural parameters, namely the 
input length n, the degree d, and the number N of terms with non-zero weight. 
Moreover, improvements are not always possible by the mutation of a small num- 
ber of bits and strings with a large Hamming distance from the optimum may 
have much larger /-values than strings close to the optimum. It is easy to see 
that the degree is a crucial parameter. Gamier, Kallel, and Schoenauer (1999) 
have proved that the (1 -I- 1)EA has an expected optimization time of 6>(2”) 
on the n-degree polynomial XiX 2 ■ ■ ■ For this function we are searching for 
a needle, the all-one string 1”, in a big haystack, namely {0, 1}". It is obvious 
that such functions are difficult for black box optimization. The cited result can 
be extended to the general case of fV = 1. 

Lemma 1. The expected optimization time of the (1-1- 1)EA on a polynomial 
with N = 1 and degree d equals 0{n2'^/d). 

Sketch of Proof. W.l.o.g. the polynomial equals xiX 2 ■ ■ ■ Xd- The probability 
that at least one of the d essential bits flips in one step equals 1 — (1 — = 

0{d/n). Hence, the expected optimization time is by a factor of 0{n/d) larger 
than the expected number of so-called active steps where one of the essential 
bits flips. As long as we have not found an optimal string, each new string is 
accepted and we have to analyze a simple Markoff chain. This can be done 
by standard arguments following Gamier, Kallel, and Schoenauer (1999). The 
expected number of active steps equals 0{2‘^) and the upper bound 0{2‘^) holds 
for each initial string. □ 

This lemma proves that the (1 -I- 1)EA is efficient in the following black box 
scenario. We know that the function / is one of the functions which equals 
1 if a;i = ai,... ,Xd = ad for some (ai,... ,ad) G {0,1}'^ and 0 otherwise. 
No sampling algorithm can generate a smaller average optimization time than 
{2‘^ + l)/2. We have an additional factor of 0{n/d) for the so-called passive 
steps and only an additional factor of ^?(1) for active steps visiting some d- 
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prefix which has been visited before. Moreover, for d = a;(logn) we cannot hope 
for randomized search heuristics with an expected polynomial optimization time. 

The following analysis of the (1 + 1)EA on low-degree monotone polynomials 
shows its efficiency on a large class of interesting functions. Moreover, the proof 
presents typical analytical tools. 

Theorem 1. The expected optimization time of the (1 -I- 1)EA on a monotone 
polynomial with N non-vanishing terms and degree d < log n is bounded by 
0{Nn2'^/d), i.e., by 0{Nn) for constant d and by 0{Nn^ /\ogn) for all d < 
logn. 

Sketch of Proof. Let A(l),... ,A{N) be the N sets such that the weights 
WA{i) are non-vanishing, i.e., WA(i) > 0, since the polynomial / is monotone. To 
simplify the notation we set Wi = WA(i) and assume w.l.o.g. that w\ > ■ ■ ■ > 
wn > 0. A weight Wi is called active with respect to the string a, if aj = 1 for 
all j G A{i), and Wi is called passive otherwise. The (1 -I- 1)EA can be described 
by a Markoff chain on {0, 1}" and we have to estimate the expected time until 
we reach a string a such that all weights rci, . . . , wjv are active with respect to 
a. The /-value of the current string is not decreasing during the search process. 

A quite general technique is to partition {0,1}” into “fitness layers” and to 
estimate the expected time to leave a non-optimal layer. The choice of the layers 
is crucial. Here we choose -|- 1 layers Lq,. . . , Ljv where 

Li := {a\wi -\ \-Wi< f{a) < Wi -\ h Wj+ij, 

if f < A^, and := |a|/(a) = -I- • • • -I- wn } consists of all optimal strings. 

The search process leaves each layer at most once. If Ti is an upper bound for 
the expected time of leaving Li from an arbitrary a G Li, then Tq -I- • • • -I- T/v_i 
is an upper bound for the expected optimization time of the (1-1- 1)EA on /. We 
prove the theorem by proving that Ti = 0{n2‘^/d). 

Let a G Li. Then, by definition, there exists some / < * -I- 1 such that Wj is 
passive with respect to a. Moreover, if Wj gets active while no active Wm gets 
passive, we leave Li. We assume w.l.o.g. that the monomial belonging to Wj 
equals XiX 2 ■ ■ ■ Xk, k < d. The idea is to compare the “complicated” Markoff 
chain Mi which describes the (1 -I- 1)EA on / starting with a and stopping when 
it leaves Li with the “simple” Markoff chain M 2 which describes the (1-1- 1)EA 
on g(x) := X 1 X 2 ■ ■ • Xk starting with a and stopping when it reaches a 5 -optimal 
string. 

The analysis of M 2 (see Lemma is simple, since each string is accepted 
until the process stops. Mi is more complicated, since it is influenced by the other 
monomials. Some new strings are not accepted, since some of the active weights 
are deactivated. This can even happen for steps increasing the number of ones 
in the A:-prefix of the string and in the {n — fc)-suffix of the string. Nevertheless, 
since all weights are non-negative, we do not believe that this will be significant. 
In order to simplify the analysis we choose for each m G {0, ... ,k} a string 
a™ = (&"*,c'”) among the strings in Li with m ones in the fc-prefix such 
that the expected time of Mi to leave Li when starting in o'" is maximal. Let 



70 



I. Wegener 



M[ be the Markoff chain obtained from Mi by replacing each string in Li with 
m ones in the fc-prefix with a™. Let be the Markoff chain obtained from 
M 2 by replacing each string with m ones in the /c-prefix with o’”. The expected 
stopping time of M^ is by definition of g equal to the expected stopping time 
of M 2 . The advantage is that M[ and M^ are Markoff chains on the small state 
space {0, ... ,k} representing the number of ones in the prefix. 

It is sufficient to prove that for some constant c' > 0 the success probability of 
M'l within c'n2‘^/d steps is bounded below by £ > 0, since the expected number 
of such phases then is bounded above by We analyze one phase of M[ and 
estimate the failure probability, namely the probability of not leaving Li. 

If Wj gets active, it may happen that other weights get passive and we do 
not leave Li. However, if Wj gets active, exactly all zeros in the fc-prefix flip. The 
behavior of the bits in the (n — fc)-suffix is independent of this event. Hence, 
with a probability of (1 — none of these bits flips implying that 

no active weight gets passive and we leave Li. If one suffix bit flips in the step 
where Wj gets active, this is considered as a failure. If such a failure happens, 
the next phase can be handled in the same way, perhaps with another selected 
weight Wh instead of wj. This failure event decreases the success probability of 
one phase at most by a factor of e~^. 

We want to compare M[ with M^. In particular, we want to show that M[ 
has a larger tendency to increase the number of ones in the /c-prefix. However, 
this is not necessarily true if at least three bits of the fc-prefix flip. Replacing 
110 with 001 may increase the /-value while the inverse step decreases the /- 
value. Therefore, a step with at least three hipping prefix bits is considered as 
a failure. The failure probability for one step equals (g)n“^ < (Pn~^ and the 
failure probability for one phase is bounded above by 

c'n2‘^d~^d^n~^ < c'd^nT^ = o(l). 



since d < log n. 

Let M” and M^' be the Markoff chains M[ and M^, respectively, under the 
assumption that within one phase there is no step with at least three hipping 
prefix bits. The success probability of M" and M^' compared with the success 
probability of M[ and M 2 , respectively, is decreased at most by a factor of 
1 — 0 ( 1 ). Let Pi (to, m+d), d £ {—2, —1, 0, -1-1, -1-2}, be the transition probabilities 
of M" on the state space {0, . . . , fcj and p 2 {iTi, m + d) the corresponding values 
of M 2 '. Then 

(1) . pi{m,m + d) < p 2 {m,m + d), if d yf 0 

The reason is that M^' accepts each new string. Moreover, 

(2) . P 2 {fn, + d)e“^ < p\{m, m + d), if d > 0 

This can be proved in the following way. Since at most two prefix bits are hipping, 
the number of ones in the prefix increases only if all hipping prefix bits flip from 
0 to 1. If furthermore no suffix bit flips (probability at least e~^), the new string 
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is accepted by M". Finally, we have to prove a tendency of M” of increasing the 
ones in the prefix (in comparison to M^). We claim that 



(3) 



Pi (m, m + d) ^ pi{m,m — d) 
P2{'m, m + d) ~ P2{'m, m — d)’ 



iiO<m — d<m + d<k 



Let us consider a™ = < k. Let c™ be any suffix. If Mi accepts the 

mutated string where 6™ is obtained from 6™ by flipping one (or two) 

ones into zeros, then Mi accepts also (5™,c™) where 6™ is obtained from 
by flipping one (or two) zeros into ones. Inequality (3) follows, since M 2 accepts 
(6™,c™) and (6™,c™). 

The Markoff chain can be easily analyzed using the methods of Gamier, 
Kallel, and Schoenauer (1999) and its generalization in the proof of Lemma Q 
The Markoff chain M" has the same asymptotic behavior. Inequality (1) shows 
that M” may stay longer in some state than M^ ■ However, Inequality (2) shows 
that the probabilities of going to a larger state are for M" at most by a constant 
factor smaller than for M!{ . Hence, the effect of staying longer in the same state 
has not a big influence. Inequality (3) is the most important one. It shows that 
the probability of increasing the state from m to m + d within one step may be 
decreased for M" compared to M^'. However, then the probability of decreasing 
the state from m to m—d within one step has been decreased at least by the same 
factor. This implies that the expected number of active steps (changing the state) 
is for M" smaller than for M^- However, the proof of this claim needs a careful 
analysis of the Markoff chain M" which is omitted here. By Markoff’s inequality, 
we can choose a constant d such that the success probability of M" within one 
phase of length dnT^ jd is at least 1/2. This implies by our considerations that 
the success probability of Mi within such a phase is at least l/(2e) — o(l) which 
proves the theorem. □ 



We emphasize one main difference between the analysis of general rando- 
mized search heuristics and problem-specific algorithms. Most of the problem- 
specific algorithms are designed with respect to efficiency and also with respect 
to the aim of analyzing the algorithm. For monotone polynomials the randomi- 
zed hillclimber flipping in each step exactly one random bit is not less efficient 
but much easier to analyze than the (1 -I- 1)EA. However, this hillclimber has 
disadvantages for other functions. It gets stuck in each local maxima while the 
(1 -I- 1)EA can escape efficiently from a local maximum if a string with at least 
the same /-value and short Hamming distance to the local maximum exists. 



4 The (1 + 1)EA on AfRne Functions and Royal Road 
Functions 

Theorem ^ cannot be improved with respect to d, see Lemma E for the case 
= 1. However, our analysis seems to be not optimal for large N. In order to 
leave the fitness layer Li we have waited until one specific passive weight is turned 
into active. We also may leave Li, since other weights get active. Moreover, 
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monomials can be correlated positively, e.g., f{x) = 2 xiX 2 ■ ■ ■ Xd + X 2 X 3 ■ ■ ■ Xd+i- 
It takes some time to leave Lq, since we have to activate the first monomial. 
Afterwards, no step flipping one of the first d bits which all are 1 is accepted. 
Hence, the expected time to activate the second monomial is only 0{n). Because 
of the monotonicity of the polynomials different monomials cannot be correlated 
negatively. One may think that the case of independent monomials is the worst 
case. 

Let n = dm. We consider monotone polynomials with m monomials with 
non-vanishing weights. All monomials are of degree d and depend on disjoint 
sets of variables. The special case of weights 1 is known as royal road function 
(Mitchell, Forrest, and Holland (1992)), since it has been assumed that these 
functions are difficult for all evolutionary algorithms without crossover and easy 
for genetic algorithms (which are based on crossover). 

Theorem 2. The expected optimization time of the (1 -I- 1)EA on royal road 
functions of degree d is bounded by 0 {n{logn) 2 ‘^/d). 

Sketch of Proof. First, we consider m independent functions each consisting of 
one of the monomials of the royal road function with degree d. By Lemma ^ and 
Markoff’s inequality, there is a constant c such that the success probability for a 
single monomial within cn2‘^/d steps is at least 1/2. The success probability after 
[logn] -I- 1 of such phases is at least 1 — l/(2n) and, therefore, the probability 
that all monomials are optimized is at least 1/2. This leads to the proposed 
upper bound in the scenario of m independent monomials. However, the (1 -I- 
1)EA considers the m monomials in parallel. This causes only small differences. 
Steps where more active monomials are deactivated than passive monomials 
are activated are not accepted and monomials may be deactivated if enough 
passive monomials are activated. It is not difficult to prove that this increases 
the expected optimization time at most by a constant factor. □ 

A different proof method for Theorem 0 das been presented by Mitchell, 
Holland, and Forrest (1994). The result shows that for royal road functions there 
is not much room for improvements by crossover. We have seen in Section Elthat 
problem-independent search heuristics cannot be successful on the average with 
less than (2‘^ + l)/2 steps. In particular, the improvement by any general search 
heuristic is bounded by a polynomial factor of 0{n{logn) /d). The situation gets 
more difficult in the case of different weights. Then it is possible that more 
monomials get deactivated than activated. In a certain sense we may move far 
away from the optimum. This situation has been handled only in the case of 
affine functions, i.e., polynomials of degree 1. In this case it is not necessary to 
assume non-negative weights, since Xi can be replaced with 1 — xi. 

Theorem 3. The expected optimization time of the (1-1-1) AA on affine functions 
is bounded by O(nlogn). It equals 0(nlogn) if all n degree-1 weights are non- 
zero. 

Idea of Proof. The full proof by Droste, Jansen, and Wegener (2001) is involved 
and long. W.l.o.g. w® = 0 and wi > ■ ■ ■ Wn > 0 where Wi := wpj. The main idea 
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is to measure the progress of the (1 + 1)EA with respect to the “generic” affine 
function 

g{xi,... ,Xn) ■■=2 Xi+ ^ Xi. 

l<i<n/2 n/2<i<n 

This function plays the role of a potential function in the analysis of data struc- 
tures and algorithms. Then successful steps {x' ^ x and f{x') > f{x) for the 
given affine function /) are distinguished from unsuccessful steps. The main step 
is to prove an upper bound of 0(1) on the expected number of successful steps 
to increase the g-value (not the /-value) of the current string. The bound on the 
number of unsuccessful steps follows then easily. Since the (1 -I- 1)EA accepts a 
string according to its /-value, it is possible that the g-value decreases. The idea 
is to design a slower Markoff chain where the g- value increases in one step by not 
more than 1 and where the expected gain of the g-value within one successful 
step is bounded below by a positive constant. Then a generalization of Wald’s 
identity on stopping times can be proved and applied. 

The lower bound is an easy application of the coupon collector’s theorem. □ 

Up to now we were not successful to generalize this bound to monotone 
degree-2 polynomials. Nevertheless, we state the following conjecture. 

Conjecture 1. The expected optimization time of the (1 -I- 1)EA on monotone 
polynomials of degree d is bounded by 0{n{logn)2‘^/d). 

5 Further Results on the (1 + 1)EA and Its 
Generalizations 

In Sections 0and0 we have tried to present typical methods for the analysis of 
the (1 -|- 1)EA by investigating and analyzing monotone polynomials. Wegener 
(2000) presents an overview on more methods. Here we mention shortly furt- 
her directions of the research on the (1 -I- 1)EA and its generalizations. Droste, 
Jansen, and Wegener (1998) have investigated the behavior of the (1 -I- 1)EA 
on so-called unimodal functions, where each non-optimal string has a better 
Hamming neighbor. In particular, they have disproved that the (1 -I- 1)EA has 
a polynomial expected optimization time on unimodal functions. Wegener and 
Witt (2001) have shown for some special degree-2 polynomials and all squares of 
affine functions that they are easy for the multi-start variant of the (1 -I- 1)EA, 
although some of them are difficult for the (1 -I- 1)EA. When optimizing a single 
monomial X\X 2 ■ ■ ■ Xd we are exploring for a long time the plateau of strings of 
fitness 0 and it would be less efficient to accept only strict improvements. Jan- 
sen and Wegener (2000b) investigate the problem of exploring plateaus more 
generally. They also show that it is sometimes much better to accept only strict 
improvements. It has been conjectured that the mutation probability 1/n is at 
least almost optimal for the (1 -I- 1)EA and each /. This has been disproved by 
Jansen and Wegener (2000a) who also have shown that it can be even better to 
work with a dynamic (l-l-l)EA which changes its mutation probability following 
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a fixed schedule. This dynamic variant is analyzed for many functions by Jansen 
and Wegener (2001b). Further strategies to change the mutation probability are 
discussed by Back (1998). 

6 A Generic Genetic Algorithm 

Evolutionary algorithms based on selection and mutation only are surprisingly 
successful. Genetic algorithms are based on selection, mutation, and crossover 
and there is a community believing that crossover is the essential operator. The 
main variants of crossover for (a, b) G {0, 1}" x {0, 1}” are 

- one-point crossover (choose i G {l,...,n — 1} randomly and create the 
offspring (oi, ... ,Oi, bi+i, , b„)) and 

- uniform crossover (choose c G {0,1}” randomly and create the offspring 
d = (di, . . . , dn) where di = at, if c, = 0 and di = bi, if Cj = 1). 

In order to apply crossover we need a population of size larger than 1. The 
main problem is to combine fitness-based selection with the preservation of 
enough diversity such that crossover has a chance to create strings different from 
those in the population. In the following it is sufficient to require that selection 
chooses X with at least the same probability as x' if f{x) > f{x'). This implies 
the same selection probabilities for x and x' if f{x) = f{x'). Many genetic al- 
gorithms replace a population within one step with a possibly totally different 
new population. It is easier to analyze so-called steady-state genetic algorithms 
where in each step only one offspring is created and perhaps exchanged with one 
member from the current population. 

Algorithm 2 (Steady-state GA). 

1) Initialization: The s(n) members of the current population are chosen ran- 
domly and independently. 

2) Branching: With probability Pc(n), the new offspring is created with cross- 
over (Steps 3.1, 3.2, 3.3) and with the remaining probability, the new 
offspring is created without crossover (Steps 4.1, 4.2). 

3.1) Selection for crossover and mutation: A pair of strings (x,y) from the cur- 
rent population is chosen. 

3.2) Crossover: z' is the result of crossover on (x,y). 

3.3) Mutation: z is the result of mutation of z' . Go to Step 5. 

4.1) Selection for mutation: A string x from the current population is chosen. 

4.2) Mutation: z is the result of mutation of x. 

5) Selection of the next generation: Add z to the current population and let 
W be the multi-set of strings in the enlarged population which have the 
minimal /-value and let W be the set of strings in W which have the largest 
number of copies in W . Eliminate randomly one string from W' from the 
population to obtain the new population. 

6) Continue at Step 2 (until some stopping criterion is fulfilled). 
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The analysis of genetic algorithms is even more difficult than the analysis of 
evolutionary algorithms without crossover. Although the crossover operator is in 
the focus of research since fourty years, there was no example known where cros- 
sover decreases the expected optimization time from exponential to polynomial. 
Experiments (Forrest and Mitchell (1993)) show that the (1 -I- 1)EA is for the 
royal road functions even faster than genetic algorithms. Watson (2000) presents 
a function where crossover probably helps. This is established by experiments 
and by a proof under some assumptions but not by a rigorous proof. 

7 Real Royal Road Functions and the Crossover Operator 

Jansen and Wegener (2001a) present the first example where crossover prova- 
bly decreases the expected optimization time from exponential to polynomial. 
Because of the history and the many discussions on the royal road functions 
they have called their functions real royal road functions. For a G {0, 1}" let 

|a| = Oi H ha„ and let 6(a) be the block size of a, i.e. the length of the longest 

block consisting of ones only (the largest I such that Oj = Oi+i = • • • = ai+;_i = 1 
for some i). Then 



^n.m (^) 



2n^ if a = (1, 1, . . . , 1) 

< n\a\ + b{a) if |a| < n — m 
0 otherwise. 

\ 



For a proof of the following lemma see Jansen and Wegener (2001a). 

Lemma 2. Evolutionary algorithms without crossover need with a probability 
exponentially close to 1 exponentially many steps to optimize the real royal road 
function Rn.\n/ 3 ] 0 ''^d with a probability o/ 1 — superpolynomially many 
steps to optimize Rn,\\ogn] ■ 



Theorem 4. Let s{n) = n, m = |"n/3] and Pc a positive constant less than 
1. Then the expected optimization time of the steady-state GA with one-point 
crossover on Rn,m is bounded by O(n^). 

Sketch of Proof. Here we use the proof technique to describe intermediate 
aims and to estimate the expected time until the aim is reached. The advantage 
is that we can use afterwards the assumption that the last aim has been reached. 
Aim 1: All strings of the population have exactly n — m ones or we have found 
the optimum. 

This aim is reached in an expected number of O(n^) steps. It is very unlikely 
to start with strings with more than n — m and less than n ones. The expected 
time to eliminate all these strings is 0(1). If we then do not find the optimum, we 
only have an expected waiting time of 0(njrri) = 0(1) to increase the number 
of ones in the population. This is due to steps with mutation only. If the selected 
string has less than n — m ones, there is a good chance to increase the number 
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of ones by a 1-bit mutation. If the selected string has exactly n — m ones, there 
is a good chance to produce a replica. 

Aim 2: All strings of the population have exactly n — m ones and a block size of 
n — m or we have found the optimum. 

This aim is reached in an expected number of 0{n^ log n) steps. If we do not 
find the optimum, we only have to increase the sum of the block lengths of the 
strings of the current population. If not all strings have the same block length, 
it is sufficient to produce a replica of a string with a non-minimal block length. 
Otherwise, certain 2-bit mutations increase the block length. 

Aim 3: All strings of the population have exactly n — m ones, a block size of 
n — m, and each of the m -I- 1 different strings with this property is contained in 
the population or we have found the optimum. 

This aim is reached in an expected number of O(n^) steps. If we do not find 
the optimum, there is always at least one string in the current population such 
that a 2-bit mutation creates a string with n — m ones and block size n — m 
which was not in the population before. 

Aim 4: The optimum is found. 

This aim is reached in an expected number of O(n^) steps. This is the only 
phase where crossover is essential. With a probability of at least Pc{n)!n^ cros- 
sover is chosen as search operator and and are selected. Then, 

with a probability of at least 1/3, one-point crossover creates 1" and finally, 
with a probability of at least e“^, mutation preserves 1” and we have found the 
optimum. □ 

Uniform crossover is less efficient for these functions. The probability of crea- 
ting 1" from and 0™!”“’” is only 2“^"^. This leads to a polynomial 

expected optimization time only if m = O(logn). Hence, crossover reduces the 
expected optimization time for some functions only from superpolynomial to 
polynomial. Jansen and Wegener (2001a) have presented a more complicated 
function where uniform crossover decreases the expected optimization time from 
exponential to polynomial. 

One may ask what happens if we replace in the definition of Rn,m the value 
of b{a) by 0. Then the size of the plateau of the second-best strings increases 
from TO -|- 1 to ()/) and it is much harder to generate enough diversity. Jan- 
sen and Wegener (1999) have investigated this function. With uniform crossover 
and the very small crossover probability Pc{n) = l/(nlog^ n) they could prove a 
polynomial expected optimization time for to = O(logn). This proof is techni- 
cally much more involved than the proof of Theorem 4 and its counterpart for 
uniform crossover. Altogether, we have only made the first steps of analyzing 
genetic algorithms with crossover. 



Conclusions 

We have argued why one should investigate and analyze different forms of rando- 
mized search heuristics, among them evolutionary algorithms. The differences in 
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the analysis of problem-specific algorithms and general search heuristics for black 
box optimization have been discussed. Then our approach has been presented by 
analyzing some evolutionary algorithms on subclasses of the class of monotone 
polynomials and by proving for the first time that crossover can decrease the 
expected optimization time significantly. 
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Abstract. Let C{A) denote the multiplicative complexity of a finite 
dimensional associative fc-algebra A. 

For algebras A with nonzero radical rad A, we exhibit several lower bound 
techniques for C{A) that yield bounds significantly above the Alder- 
Strassen bound. In particular, we prove that the multiplicative complex- 
ity of the multiplication in the algebras k[Xi , . . . , Xn]/ Id+i {X \, . . . , X„) 
is bounded from below by 3 • ("+'*)- where 

Id{Xi , . . . , Xn) denotes the ideal generated by all monomials of degree 
d in Xi, . . . , Xn- Furthermore, we show the lower bound G(Tn(fc)) > 
(2| — o(l)) dimT„(fc) for the multiplication of upper triangular matri- 
ces. 



1 Introduction 

A fundamental problem in algebraic complexity theory is the question about 
the costs of multiplication, say of matrices, triangular matrices, polynomials, or 
power series, just to mention a few. To be more specific, let A be a finite dimen- 
sional associative /c-algebra with unity 1. By fixing a basis of A, say ui, . . . , vn, 
we can define a set of bilinear forms corresponding to the multiplication in A. If 
for I < fJ,,v < N with structural constants a\^}j S k, then 
these constants and the identity 

( N \ ^ \ ^ 

I I Yj^Vi, I = &k(A1, Y)vk, 

/r=l / \;y=l / k=1 

define the desired bilinear forms 6 i,...,5at. The multiplicative complexity of 
6i, . . . , is the smallest number of essential multiplications and divisions nec- 
essary and sufficient to compute b\, . . . ,b^ from the indeterminates X\, . . . , Xjd 
and Yi, . . . , Y^r- 

According to Strassen HSI, we may reformulate the problem over infinite 
fields as follows: the multiplicative complexity of 6i , . . . , 6 tv is the smallest num- 
ber £ of products p\ = u\{Xi,Yj) ■ v\{Xi,Yj) with linear forms u\ and v\ in 
the Xi and Yj such that the linear span of pi, . . . contains &i, . . . , 5iv. (The 
restriction to infinite fields is not critical for this work, since we are concerned 
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with lower bounds.) From this characterization, it follows that the multiplicative 
complexity of bi, ... ,b^ does not depend on the choice of vi, . . . ,vn, thus we 
may speak about the multiplicative complexity of (the multiplication in) A. For 
a modern introduction to algebraic complexity theory, we recommend |5] . 

A fundamental lower bound for the multiplicative complexity is the so-called 
Alder-Strassen bound 0 (see Sectio n 11.211 . Recently, this bound has been im- 
proved for a large class of semisimplqj algebras |S| as shown in 0. The main 
contributions of this work are improvements of the Alder-Strassen bound for 
algebras with nonzero radical, like upper triangular matrices. 

Closely related to the multiplicative complexity is the bilinear complexity 
(or rank). Here the products p\ = u\{Xi) ■ v\(Yj) are bilinear products, that 
is, products of linear forms u\ in the and linear forms v\ in the Yj. (Note 
that 6 i,...,6at are bilinear forms.) The multiplicative complexity is clearly a 
lower bound for the bilinear complexity and it is easy to show that twice the 
multiplicative complexity is an upper bound for the bilinear complexity (see e.g. 
10 Eq. 14.8]). Therefore, we usually want to derive upper bounds for the bilinear 
complexity and lower bounds for the multiplicative complexity. 

While the difference between multiplicative and bilinear complexity seems 
to be minor at a first glance, it is much harder to cope with the multiplica- 
tive complexity when dealing with lower bounds. The main reason is the fact 
that the bilinear complexity of a tensor of a bilinear map (see below for a def- 
inition) is invariant under permutations whereas the multiplicative complexity 
might not, see also 0 Chap. 14.2] for a further discussion. For instance, if we 
consider bilinear complexity, then the bound © holds also for any algebra with 
nonzero radical, provided that the semisimple quotient algebra Aj rad A fulfils 
the corresponding premises On the other hand, there are examples given 

in 0 Sect. 6] that show that the novel methods from 0 for the multiplicative 
complexity do not apply to algebras with nonzero radical. 

1.1 Model of Computation 

For proving lower bounds, a coordinate- free definition of multiplicative complex- 
ity is often more appropriate than the one given above, see e.g. 0 Chap. 14.1]. 
In the following, if E is a vector space, let V* denote the dual space of V, i.e., 
the vector space of all linear forms on V. 

Definition 1. Let k be a field, U , V , and W finite dimensional vector spaces 
over k, and 4> : U x V ^ W a bilinear map. 

1. A sequence (3= {fi, gi,wi, . . . , fi, gi,w^) with fx,g\ S (UxV)* andwx e W 
is called a quadratic computation for </> over k of length i if 

e 

cj){u,v) = ^ fx{u,v)gx{u,v)wx for alluGlJ.v G V. 

A^l 

^ For a finite dimensional associative fc-algebra A with unity, the radical rad A is the 
intersection of all maximal twosided ideals of A. An algebra A is called semisimple 
if rad A = {0}, see pn| for more details. 
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2. The length of a shortest quadratic computation for (f is called the multiplica- 
tive complexity of 4> and is denoted by C{<f>). 

3. If A is a finite dimensional associative k -algebra with unity, then the mul- 
tiplicative complexity of A is defined as the multiplicative complexity of the 
multiplication map of A, which is a bilinear map A x A ^ A, and is denoted 
by C{A). 

If we require that f\ S U* and g\ S V* in the above Definition P, we get 
bilinear computations and bilinear complexity (also called rank). We denote the 
bilinear complexity of a bilinear map f> by and the bilinear complexity of 
an associative algebra A by R{A). We have C'(^) < R{(f) < 2 ■ for any 
bilinear map (j). Except for trivial cases, the second inequality is always strict, 
see )T^ . 

1.2 Previous Results 

The best general lower bound for the multiplicative complexity of an associative 
algebra A is due to Alder and Strassen PJj they show 

C(A) > 2 dim A — t , (1) 

where t is the number of maximal twosided ideals in A. This has recently been 
improved for a large class of semisimple algebras by Blaser jSj: if A is semisimple 
and if in the decomposition of A = Ai x • • • x A( into simple factors At = xn^ 
with division algebra Dt, each At is noncommutative, then 

C(A) > I dim A — 3(ni + • • • + nt). (2) 

Specifically, the multiplicative complexity of n x n-matrix multiplication is at 
least |n^ — 3n. While the lower bound 0 also holds for algebras with nonzero 
radical in the case of bilinear complexity PEI provided that A / rad A fulfils the 
above condition, examples are presented in pj Sec. 6] that show that those meth- 
ods do not transfer to the multiplicative complexity for algebras with nonzero 
radical. 

1.3 New Results 

The starting point of our work is the above observation that the results from Pl 
1^ for the bilinear complexity do not transfer to the multiplicative complexity if 
A has nonzero radical. As our main contribution, we improve the Alder-Strassen 
bound o for various classes of algebras with nonzero radical. 

As our first main result, we obtain the lower bound 

C(A) > dim A -k dim(rad A)™ -k dim(rad A)" - dim(rad A)”+’"-\ 

for any n,m > 0 (Theorem This bound works very well for algebras with 
“growing” radical, that means, 2dim(radA)"^ is much larger than dim rad A 
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where m is the smallest natural number such that (radA)^™”^ = {0}. As an 
example, we apply this bound to the algebras k[Xi , . . . , A„]/ Id+i (Ai, . . . , A„) 
where /^(Ai, . . . , A„) denotes the ideal generated by all monomials of degree 
d in Ai,...,A„. Furthermore, we obtain a sequence of explicitly given alge- 
bras An with C{An) > (3 — o(l))dimA„. To our knowledge, this is the best 
lower bound known for a concrete algebra. (For existential results of algebras of 
high complexity, see |7|.) 

More complicated is the case of algebras with “nongrowing” radical, the most 
important one is probably the algebra of upper triangular n x n-matrices T„(fc). 
We deal with this in Section 0 Our second main result is Lemma [T] This lemma 
provides a way to prove lower bounds above the Alder Strassen bound for alge- 
bras A with “nongrowing” radical. Here, C{A) is estimated from below by the 
multiplicative complexity of a bilinear map if) obtained from the multiplication 
in A by restricting to some subspaces. After that, C(^/>) can be estimated using 
techniques introduced in |E| . In Section El we apply this method to the algebra 
of upper triangular matrices and get the lower bound 

C(T„(fc))>(2|-o(l))dimT„(fc). 

This is the first bound for T„(fc) significantly above the Alder-Strassen bound. 
Prior to this, we only knew that the Alder-Strassen bound could be improved 
by an additive amount of one for n > 3 im, that is, T„(fc) is not an algebra of 
minimal rank. 

2 Preliminaries 

For the reader’s convenience, we compile some preliminaries to which we will 
refer frequently in the subsequent sections. In the first part of this section, we 
present an alternative characterization of multiplicative complexity, the so-called 
“tensorial notion” and state the relevant results. In the second part, we briefly 
review the lower bound techniques used by Alder and Strassen to prove their 
lower bound. These techniques basically are sophisticated refinements of the 
substitution method due to Pan m 

2.1 Characterizations of Multiplicative Complexity 

In the previous section, we have introduced the multiplicative complexity of a 
bilinear map in terms of computations. A second useful characterization of mul- 
tiplicative complexity is the so-called “tensorial notion” (see Chap. 14.4] for 
the bilinear complexity) . With a bilinear map (j) : U xV ^ W, we may associate 
a coordinate tensor (or tensor for short) which is basically a “three-dimensional 
matrix” : we fix bases ui, . . . , Um of U, vi, . . . ,Vn of V, and wi, . . . ,Wp of W. 
There are unique scalars S k such that (j){up,Vv) = Xlp=i 
1 < ^ < 1 < V < n. Then t = {tp^n,p) G f^mxnxp tensor of (j) (with 

respect to the chosen bases). On the other hand, any given tensor also defines 
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a bilinear map after choosing bases. We define the multiplicative complexity of 
the tensor t by C{t) := C{(p). In the same way, the bilinear complexity of t is 
R{t) := R{(j)). (This is in both cases well-defined.) 

With each tensor t = we may associate three sets of matrices, the 

slices of t. The matrices S with 1 < fi < m 

are called the 1-slices of t, the matrices Si, = S with 

I < V < n the 2-slices, and finally Tp = (tp,j/,p)i<p<m,i<i/<n G with 

^ < p 1S P are called the 3-slices of t. When dealing with bilinear complexity, it 
makes no difference which of the three sets of slices we consider. In the case of 
multiplicative complexity, however, the 3-slices play a distinguished role. 

Lemma 1. Let k be a field and t he a tensor with 3-slices T\, . . . ,Tp G 
Then C{t) < i if and only if there are (column) vectors u\,v\ G A:’"’*'" for 
1 < X < £ such that with P\ := u\ ■ vj G fc('"+")x(™+") 

elin{Pi+Pr,...,P^ + P/}. (3) 

Here, denotes the transpose of a matrix T and lin{. . .} denotes the linear 
span. A proof of this lemma is straight forward (see for instance [T21 Thm. 3.2]). 

If Ti,...,Tp are the 3-slices of a tensor t, we will occasionally write 
C{Ti, . . . ,Tp) instead of C{t) and R{Ti, . . . ,Tp) instead of R{t). By multiply- 
ing (0 with 



fX 0 
0 



and 



( 0 
V 0 r 



from the left and right, respectively, it follows from Lemma ^that if A G 
and Y G A:”^" are invertible matrices, then 



C(Ti, ...,Tp) = C(X-T,-V,...,X-Tp-V). (4) 



2.2 The Lower Bound Techniques of Alder and Strassen 

Beside the original paper of Alder and Strassen, ^3 Chap. IV. 2] and ^ Chap. 17] 
are excellent treatments of the results of Alder and Strassen. We have taken the 
term “separate” and the extension lemma from there, but everything is also 
contained in the work of Alder and Strassen PJ ■ 

Definition 2. Let U , V, W be vector spaces and (3 = {fi, gi,w\, . . . , fi, gi, wi) 
be a quadratic computation for a bilinear map (f : Lf x V — >■ W. Let Lf\ C {/, 
Vi C V, and Wi CW be subspaces. The computation fd separates (17i, Vi, Wi), 
if there is a set of indices I C {X \ w\ ^ Wi} such that after possibly exchanging 
some of the f\ with the corresponding g\, we have {U\ x V\) nP|jgjker/i = {0}. 

The latter condition is equivalent to the condition that {fi\uixVi)iei generate 
the dual space (C/i x Vi)*. 
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li (j) \ U X V ^ W \s & bilinear map and U\ QU and Vi C V are subspaces, 
then {u+Ui,v + Vi) !->■ (j){u, v) + W defines a bilinear map U/UiX V jV\ — >■ W jW 
where W := Vai{(j){Ui,V)} + lin{(j){U, Vi)}. This map is called the quotient of (j) 
by U\ and Vi and is denoted by (/)/(C/i x Vi). We have the following lower bound. 
(See |Hl Lem. 17.17] where also a proof is given.) 

Lemma 2. Let U , V , and W be vector spaces and (3 = (fi, gi,wi, . . . , fi, gi, wi) 
be a quadratic computation for some bilinear map (j) : U x — >■ W . Let U\ C U , 

Vi C V, and W\ QW be subspaces such that j3 separates {Ui,Vi,Wi). Let tt be 
an endomorphism ofW such that Wi C kerTr. Then 

i > C{{tt o <f>) /{Ui X Vi)) + dim C/i + dim Vi + #{A | w\ G Wi}. 

If 4> is the multiplication map of an associative algebra A and / is a twosided 
ideal of A, then (/>/(/ x /) is the multiplication map of the quotient algebra A/I. 

Corollary 1. Let A be an algebra and I Q A a twosided ideal. Let P be a 
quadratic computation for A of length i that separates (/,/, {0}). Then i > 
C{A/I) + 2dim/. 

To achieve good lower bounds by means of Lemma El one has to find an 
optimal bilinear computation which separates a “large” triple. An important 
tool to solve this task is the following “extension lemma”, see |Bl Lem. 17.18]. 

Lemma 3 (Alder and Strassen). Let Lf , V, W be vector spaces and /3 be 
a quadratic computation for a bilinear map (j) : U x V ^ W . Let Ui Q U 2 Q U , 
Vi C V, and W\ CW be subspaces such that P separates {Ui,Vi,Wi). Then P 
separates also {U 2 ,Vi,Wi), or there is some u G U 2 \Ui such that 4>{u,V) C 
lin{PiUi,Vi)} + Wi. 

In the course of their proof, Alder and Strassen first deal with the radical of 
an algebra A and then turn to the semisimple quotient algebra A/ rad A. The 
following lemma contains the first of the two important statements established 
by Alder and Strassen, see PJ Lem. 2] or 0 Prop. 17.20]. 

Lemma 4 (Alder and Strassen). Let P be a quadratic computation for an 
associative algebra A. Then P separates (rad A, rad A, {0}). 

3 Lower Bounds: “Growing” Radicals 

In this section, we develop a method that works very well if the radical of an 
algebra A “grows”, more precisely, 2dim(rad A)™ is much larger than dim rad A 
where m is the smallest natural number such that (radA)^"*“^ = {0}. This 
method has already been applied to the bilinear complexity, see m In contrast 
to the “hard” case (in Section ED, we do not loose anything when extending this 
method to the multiplicative complexity. 

We build upon the following lower bound which is given in 0, Lem. 2]. 
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Lemma 5. Let A be an assoeiative algebra. If U,V C rad A are veetor spaees, 
then 



C(A) > dimt/ + diml/ + mm {dimlin{a:x (x,x ) S AT}}. 

^ ^ ~ xe(uxv) I \ ^ J J 

= AxA 



To put this lemma into effective use, we have to estimate the dimension of 
the subspaces Ym{xx' \ (x, x') G X} in the above LemmalSl This is done through 
the following lemma. 

Lemma 6. Let A be an associative algebra and m,n > 0 be natural numbers. 
For any vector space X such that X © ((rad A)™ x (rad A)”) = Ax A, 

lin{xx' I (x,x') G X} + (rad = A. 

Proof. For the ease of notation, if F C A x A is a vector space, let Y denote the 
vector space Vra.{yy' \ {y,y') G Y}. 

For 0 < p, < m and 0 < < n, let C X be a vector space such that 

-^/i,i/©((rad A)™ X (rad A)") = (rad x (rad A)'". Such an exists, because 
X © ((rad A)"* X (rad = A x A. As X n ((rad A)™ x (rad A)") = {0}, it is 
also unique. Furthermore, Xf^^^ C X^j_iy for p' < p and v' < v. 

For any (m,x) G (radA)^ x (radA)*^, there are (a, 5) G (rad A)™ x (rad A)” 
such that (u, v) + (a, b) G Thus (u + a)(x + 6) G X^,i/. But {u + a){v + 6) G 

UV+ (rad Letting (it, v) run through all elements of (rad A)^ x (rad A)’’', 

we get 



X^,, + (rad A)'^+"+i = (rad A)^+© (5) 

We now prove by backward induction in p + v that 

X^.^ + (rad A)™+"-i = (rad A)^+“^ for all ^ < m, < n. 

For p = V = Q, this is the claim of the lemma. 

The induction start {p = m — 1, v = n — 1) follows directly from (0). For the 
induction step, let p and z/ be given such that p + v < m + n — 2. We assume 
that p < m — 1, the case v < n — 1 follows completely alike. By substituting the 
induction hypothesis X^+i^,y + (rad A)"*+”“^ = (radA)^+'^+^ into 10, we obtain 
+ X^+i_,y + (radA)™+”“^ = (radA)^+“^. Now the claim follows from the 
fact that U CV ifU CV. □ 

Combining the last lemma with Lemma El shows the following lower bound. 

Theorem 1. Let A be a an associative algebra. For all m,n > 0, 

C{A) > dim A + dim(rad A)™ + dim(rad A)" - dim(rad A)’"+’"-b 
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4 Multiplying Multivariate Power Series 

For indeterminates . . . , let Id{Xi, . . . , Xn) denote the ideal 
generated by all monomials of degree d. Let Pn,d be the algebra 
k[Xi, . . . , Xn]/ Id+i(Xi, . . . , Xn). The multiplication in Pn^d can be inter- 
preted as the multiplication of n-variate power series where we only compute 
with the coefficients of the monomials with degree at most d. For the algebras 
Pn,d, the methods of the preceding section give a nice lower bound. The below 
theorem follows at once from Theorem [D and the fact that the dimension of 
dim Pn,d is {"n‘^) ■ 

Theorem 2. For any n > 0 and d > 0, 

C{Pn,d) > 3 • ("+") - 

Remark 1. If we keep d > 1 fixed, then C{Pn,d) > (3 — o(l)) dimP„_d (as a 
function in n). This gives a sequence of explicitly given algebras such that the 
multiplicative complexity comes arbitrarily close to three times the dimension 
of the algebra. This is the the best we can expect from currently known lower 
bounds techniques for bilinear problems. To our knowledge, this is the first lower 
bound of this kind. 



5 Lower Bounds: The Hard Case 

Throughout the remainder of this section, we use the following notations: as 
usual, A is an associative algebra. We denote its multiplication map by 4>. We 
assume that we have a decomposition A = I (B X (B Y (as vector spaces) with 
vector spaces X and Y and a twosided ideal I. Furthermore, P — {0} and 
Y ■ I = {0}. Moreover, let P C X and C F be vector spaces such that for all 
projections tt of A onto I (BU (BV, the bilinear map tt o (() is 1-concise, that is, 
its left kernel {a G 4 | tt o (/)(a, A) = {0} } equals {0}. 

Our general plan looks as follows: using Lemma El we reduce the proof of a 
lower bound for C{4>) to the proof of a lower bound for C{<l)\xxi) where 4>\xxi 
denotes the restriction of 4> to X y. I. This reduction works for any algebra with 
the above decomposition property. After that, we have to estimate C{4>\x-xi) 
individually. This is done in the next section for the particularly important case 
of upper triangular matrices. The main result of this section is the following 
lower bound. 

Lemma 7. With the notations from above, 

C{A) > C{4>\xxi) + dim A -I- dimF — dimt/ — dimF. 

Proof. Let /3 = (fi, gi,wi, . . . , ft, gi, we) be a quadratic computation for A. Since 
w\, . . . ,we generate A, we may assume that Ym{we-m+i, ■ ■ ■ , we}(Bl(BU (BV = A 
where m = dim A — dim I — dim U — dim V. Let tt denote the projection along 
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lm{we-m+i, ■ ■ -,we} onto /© 17 ©I^. Then (3' = {fi,gi,w[, . . . , ft , ge , w'^,) with 
w'^ = tt{w\) and £' = £ — m is a, quadratic computation for tt o f). 

We claim that £3' separates {A, {0}, {0}). If this was not the case, then there 
would be an a G A \ {0} by Lemma |3| such that 

7To</.(a,7l)C</.({0},{0}) + {0} = {0} 

contradicting the assumption that tt o ^ is 1-concise. 

From the definition of “separate”, it follows that /3' also separates (/ © 
In other words, /i|(/ey)x{o}, ■••,/<!' I (/®r)x{o> generate ((/©F) x 
{0})* after possibly exchanging some of the fx with the corresponding gx- 
Let if = {tt o Obviously (3 = {fi, gi,w[, . . . , ft , gt ,Wg,) with fx = 

/aUx7 and gx = gx\Axi is a quadratic computation for if. 

As (/©y) X {0} C Ax /, /a|(/®Y)x{ 0 } = (/aUxz)|(/®y)x{o} = /a|(/®Y)x{ 0 }- 
From this, we get that also (3 separates (J © F", {0},{0}). Lemma |21 now yields 
the lower bound 



/ > C(V'/(7© y) X {0}) + dim/ + dimy (6) 

By the definition of “quotient” in Section fITR ff/{^ © y) x {0} is a bilinear 
map A/(J©y) X J — >• {I(BU(BV)/W that maps (a+(/©y),&) to Tro(f(a,b) + W 
where W = lin{7r o cf{A, {0})} + lin{7r o (f£I © Y, /)}. (For a vector space y, we 
identify ZjifS] with Z.) Since P = {0} and Y -I = {0} by assumption, W = {0}. 
For a: + / © y G A/{I © Y) and t G I, 

if /{I © y) X {o}(x + / © y, t) = TT o <f(x, t) = xt, 

since x ■ t G I. Thus, the following diagram commutes 

A/(J © y) X / ~ j © [/ © y 

hxid 

X xl — ^ I 

where h : A/{I (BY) ^ X denotes the canonical isomorphism. Hence we obtain 
C{if/{I(BY) X {0}) > C{f>\xxi)- 

Exploiting £' = £ — m and choosing (3 to be an optimal computation, the 
claim of the lemma follows from ©. □ 

6 Multiplication of Upper Triangular Matrices 

We now apply the results of the preceding section to the algebra T„(/c) of upper 
triangular n x n-matrices with entries from k. For the sake of simplicity, we 
assume that n is even. 

In the following, let Cij G T„(/c) denote the matrix that has a one in posi- 
tion (i,j) and zeros elsewhere for 1 < i < j < n. The radical R of T„(/) equals 
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Fig. 1. The decomposition of T„ (k) 



the linear span of all etj with i < j, that is, R is the set of all matrices with 
purely zeros on the diagonal. More general, the power equals the linear span 
of all Cij with i + h < j. 

We will first step down from T„(/c) to the quotient A = By 

Corollary [n we obtain 

C(T„(/c)) > C{A) + 2 dim i?"/2+i = c{A) + \n^ - \n. (7) 

The multiplication in A corresponds to the multiplication of upper triangular 
matrices where we do not compute the entries in the positions (i,j) with i + 
n/2 + 1 < j. We use this representation in the following. 

Next, we have to instantiate /, X, Y, U, and V. For the remainder of this 
section, let m = n/2. We choose (see Figured for an illustration) 

I = lin{eij I i < m and j > m}, X = lm{eij | i < to and j < to}, 

Y = lin{eij | i > to and j > to}, U = {0}, 

V = lin{em+i 

,ni 6m+2,n; ■ • ■ ; 

Obviously, A = I (B X (BY. A straightforward calculation shows that = {0} 
and Y ■ I = {0} (in A). Moreover, to fulfil the assumptions of LemmaCJ we have 
to show that for any projection tt onto I(BU(BV = /©V", Tro^is 1-concise, where 
4> denotes the multiplication map of A. So for each Cij G A, we have to find an 
element a € A such that 7r(eij • o) yf 0. We consider three cases: if Cij G I, that 
is, i < TO and j > to, then Cij ■ ejj = Cij G I, thus 7r(eij) = Cij yf 0. If Cij G X, 
i.e., i <m and j < to, then eij ■ = ^i,m+i S I. If finally Cij G Y, that is, 

i > m and j > to, then • ey„ = ey„ G V , thus 7r(eij • ey_„) = yf 0. The 
1-conciseness of tt o (/> follows from this. 

It remains to estimate C{(j)\xxi)- Our aim is to use the following lemma 
which is proven in P, Lemma 4]. In what follows, \B, C] := BC — CB denotes 
the Lie product of two matrices. 

Lemma 8. Let k be a field. Let t be a tensor with 3-slices In,B,C G . 

Then C{t) > N + \ r\i[B,C]. 
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To utilize this lemma, we have to determine the tensor of 4>\x^i'- for a clearer 
presentation, we choose the basis 



ei,i, • ■ • , ei^m) 

V ^ 

first group 



• !t '^2,2 !t • • • 5 



fth group 




mth group 



for X (in this row-wise order) and the basis 



5 • • • 7 ^272,771+1 5 • ■ • 7 ^j,Tn-\-j 7 ' * ' 7 ^771, m-\-j 7 ’ * * 7 ^m,n 

^ ^ ^ S.— 

■v* ■V' 

first group jth group mth group 

for I (column- wise ordering). The third basis (again for I, since <P\xxi is a 
mapping X x I ^ I) equals the second basis (but we forget about the groups). 
We denote the 3-slice of the tensor of 4>\xxi that corresponds to Ci^m+j by Tij 
for j < i < m. The Tij are matrices of size M x M where M = ^m{m + 1). We 
associate a block structure with the Tij’s induced by the above groups of the 
first and second basis. 

In Tij, the only positions with nonzero entries are in the block at position 
(i,j), that is, in the positions whose rows and columns correspond to the vectors 
of the fth and jth group of the above two bases, respectively. 

An easy calculation shows that the entries of j within these positions equal 

( 8 ) 

where denotes the k x K-identity matrix and denotes the zero matrix of 
size ^xv. In particular, the 3-slices are linearly independent. From a macroscopic 
point of view with respect to the above block structure, the Ti j are block lower 
triangular matrices “of the form with the above matrix OSJ as the only 
nonzero entry (instead of a one) . 

By Lemma m the fact that C{(f>\xxi) < ^ is equivalent to the existence of 
rank one matrices P\, ... such that 

( j!t ^ e lin{Pi -H + Pj} for i < j < m. 

We now exploit the Steinitz exchange to save one product for each tensor T^,i, 
with ^ > u + 2: there are matrices . . . , Sm and Qi, . . . , Qm-i in lin{T^,j/ | 
fj,> V + 2} such that after a suitable permutation of the Pi,. .. ,Pi 

( 0 Sy\ ( 0 QA 

\Tl, 0 J [s; 0 J’ 0 J [qJ 0 

lin{Pi -I- ■ I Pe-s + Pj-s} for 1 < /r < m, 1 < 1 / < m — 1, (9) 

where s = \m{m + 1) — m — {m — 1). Thus we have killed s products. 
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Let Ai, . . . , Am G fc be pairwise distinct. Define 

E = Ti^i — S'! + • • • + Tm,m — Sm, 

B = Ai(Ti^i — Si) + • • • + Xm{Tm,Tn ~ Sm), 

C = T2P — Ql + ■ ■ ■ + Tm,m-1 — Qm-1- 
From (0, we obtain 

C{cli\xy^i)>s + C{E,B,C). (10) 

With respect to the above block structure, E has solely identity matrices on the 
main diagonal and zero matrices one the first subdiagonal. The matrix B has 
A^ multiples of identity matrices on the main diagonal and also zero matrices 
on the first subdiagonal. The matrix C has zero matrices on the diagonal and 
“nearly” identity matrices, more precisely, a line of zeros with an identity matrix 
(as depicted in (0)) one the first subdiagonal. 

The matrix E is invertible. By 

C{E,B,C) = C{Im,BE-\CE-^). (11) 

Due to the structure of E, E~^ also has solely identity matrices on the main di- 
agonal and zero matrices on the first subdiagonal. Thus, BE~^ has A^ multiples 
of identity matrices on the main diagonal and zero matrices on the first subdi- 
agonal. In the same way, CE~^ has zero matrices on the diagonal and “nearly” 
identity matrices on the first subdiagonal. Some easy algebra shows that due to 
this structure, the Lie product \BE~^ , CE~^] has zero matrices in the blocks on 
the main diagonal and the matrix 

' To " 

in the {j + 1, j)-block (on the first subdiagonal) for 1 < j < m — 1. Hence, the 
rank of [BE~^ ,CE~^] is at least l-|-2-|-----|-m— 1= |(m — l)m. 

Together with II 1 1 III . Ill III , and Lemma 0 the last statement implies the fol- 
lowing lower bound. 

Lemma 9. With the notations from above, C{(f>\xxi) ^ ~ |™+ 1- 

Exploiting 0 and then bounding C{A) by LemmaQand Lemma0 we obtain 
the following lower bound. 

Theorem 3. For even n, the multiplicative complexity of the multiplication of 
upper triangular matrices of size n x n has the lower bound 

C(T„(fc)) > - |n -k 1 > (2| - o(l)) dimT„(fc). 

Remark 2. For odd n, the same approach also yields (2| — o(l)) dimT„(fc) as a 
lower bound. A quick solution goes a follows: simply embed T„_i(fc) into T„(/c) 
and apply the above theorem. We only loose an additive amount of 0(n) < 
o(dimT„(fc)). 
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Abstract. We consider the problem of enumerating all minimal integer 
solutions of a monotone system of linear inequalities. We first show that 
for any monotone system of r linear inequalities in n variables, the num- 
ber of maximal infeasible integer vectors is at most rn times the number 
of minimal integer solutions to the system. This bound is accurate up 
to a polylogir) factor and leads to a polynomial-time reduction of the 
enumeration problem to a natural generalization of the well-known du- 
alization problem for hypergraphs, in which dual pairs of hypergraphs 
are replaced by dual collections of integer vectors in a box. We provide 
a quasi-polynomial algorithm for the latter dualization problem. These 
results imply, in particular, that the problem of incrementally generating 
minimal integer solutions of a monotone system of linear inequalities can 
be done in quasi-polynomial time. 

Keywords: Integer programming, complexity of incremental algorithms, 
dualization, quasi-polynomial time, monotone discrete binary functions, 
monotone inequalities, regular discrete functions. 



1 Introduction 

Consider a system of r linear inequalities in n integer variables 

Ax > b, X £ C = {x G \ 0 < X < c} , (1) 

where A is a rational r x n-matrix, 6 is a rational r- vector, and c is a non-negative 
integral n- vector some or all of whose components may be infinite. We assume 
that © is a monotone system of inequalities: if x £ C satisfies © then any 
vector y £ C such that y > x is also feasible. For instance, © is monotone if the 
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matrix A is non-negative. Let us denote hy = J-A,b,c the set of all minimal 
feasible integral vectors for (P), i.e. y € T \i there is no solution a; of (P such that 
X < y, X y. In particular, we have {x S C | Ax > b} = x & C \ x > y}. 

In this paper, we are concerned with the problem of incrementally generating 

GEN {T A, b, Cl df): Given a monotone system 0) and a set X C EA,b,c of minimal 
feasible vectors for m, either find a new minimal integral vector satisfying 
m, or show that X = lFA,b,c- 



The entire set T = T A.b,c be constructed by initializing df = 0 and iteratively 

solving the above problem |lF| -|- 1 times. 

If A is a binary matrix, and 6, c are vectors of all ones, then T is the set 
of (characteristic vectors of) all minimal transversals to the hypergraph defined 
by the rows of A. In this case, problem GEN{tFA,b,c N) turns into the well- 
known hypergraph dualization problem: incrementally enumerate all the minimal 
transversals (equivalently, all the maximal independent sets) for a given hyper- 
graph (see, e.g., j,3|1 Oj ). Some applications of the hypergraph dualization problem 
are discussed in 1 1 1/19) . The case where A is binary, c is the vector of all ones and 
b is arbitrary, is equivalent with the generation of so-called multiple transversals 
0. If A is integral and c = -l-oo, the generation of T can also be regarded as the 
computation of the Hilbert basis for the ideal {x S Z" | Ax >5, x > 0}. One 
more application of problem GEN{EA,b,c is related to stochastic program- 
ming, more precisely to the generation of minimal p-efficient points for a given 
probability distribution of a discrete random variable f G Z". An integer vector 
y £ Z" is called p-efhdent, if Prob{^ ^ y) P- It is known that for every prob- 
ability distribution and every p > 0 there are finitely many minimal p-efhcient 
points, furthermore that for r-concave probability distributions these points are 
exactly the minimal integral points of a corresponding convex monotone system 
(see, e.g., [HI). 



Let J* = {j I Cj = oo} and J, = {1, . . . ,n}\J* be, respectively, the sets of 
unbounded and bounded integer variables in ([[J. Consider an arbitrary vector 
X = (xi, . . . ,x„) £ EA,b,c such that xj > 0 for some j £ J*. Then it is easy to 
see that 



Xj < max 
>0 



min{0,ajfe}cfe' 



< - 1 - 00 . 



( 2 ) 



Since the bounds of (0 are easy to compute, and since appending these 
bounds to m does not change the set EA,b,c we shall assume in the sequel that 
all components of the non-negative vector c are finite, even though this may not 
be the case for the original system. This assumption does not entail any loss of 
generality and allows us to consider T A,b,c as a system of integral vectors in a 
finite box. We shall also assume that the input monotone system (PJ is feasible, 
i.e., T A.b,c 0- For a finite and non-negative c this is equivalent to Ac > b. 
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Let ^ be a collection of integral vectors in C and let = {x G C \ x > 
a for some a G A} and A~ = {x G C | a; < a for some a G A} denote the ideal 
and filter generated by A. Any element in C\A~'' is called independent of A. Let 
I{A) be the set of all maximal independent elements for A, then for any finite 
box C we have the decomposition: 

A+nl(A)" = 0, A+U1(A)- =C. (3) 

In particular, if A is the set T = T A,h,c of all minimal feasible integral vectors 
for dD, then the ideal is the solution set of 0, while the filter C \ is 
generated by the set 2i(iF) of all maximal infeasible integral vectors for (0: 

{xGC\Axffb}= IJ {j/}". 

It is known that the problem of incrementally generating all maximal infeasible 
vectors for © is NP-hard even if c is the vector of all ones and the matrix A is 
binary: 

Proposition 1 (c.f. EH ). Given a binary matrix A and a set X C X{TA,h,c) 
of maximal infeasible Boolean vectors for Ax > b, x G {0, 1}", it is NP-complete 
to decide if the set X can he extended, that is if X{tFA,b,c) \X 

In contrast to that, we show in this paper that the problem of incrementally 
generating all minimal feasible vectors for (PJ is unlikely to be NP-hard. 

Theorem 1. Problem GEN{tFA,b,c, A) can be solved in quasi-polynomial time 
poly{\input\) where t = max{n, r, \X\\. 

It was conjectured in HH that problem GEN{tFA,b,c,X) cannot be solved in 
polynomial time unless P=NP. 

To prove Theorem^ we first bound the number of maximal infeasible vectors 
for in terms of the dimension of the system and the number of minimal 
feasible vectors. 

Theorem 2. Suppose that the monotone system m is feasible, i.e., Ac > b. 
Then for any non-empty set X C TA,b,c we have 

\I{X)nl{EA.b,c)\<rJ2p(^)^ (4) 



where p{x) is the number of positive components of x. In particular, 

\I{X)GM{TA,b,c)\<rn\X\, 

which for X = J-A,b,c leads to the inequality \I{TA,b,c)\ < ‘rn\J-A,b,c\- 
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It should be mentioned that the bounds of Theorem |3 are sharp for r = 1, e.g., 
for the inequality Xi + . . . + Xn > n. For large r, these bounds are accurate up to 
a factor poly-logarithmic in r. To see this, let n = 2k and consider the monotone 
system of r = 2*^ inequalities of the form 



Xii + Xi^ Xi^, >1, ii G {1, 2}, S {3, 4}, ... ,ik G {“2k — 1, 2k}, 



where x = {x\, . . . ,Xn) G C = {x G \ 0 < x < c}. For any positive inte- 
gral vector c, this system has 2^ maximal infeasible integral vectors and only k 
minimal feasible integral vectors, i.e.. 



m^A,b,c)\ 






Needless to say that in general, \J-A,b,c\ cannot be bounded by a polynomial 
in r, n, and \X{J- A,b,c)\- For instance, for n = 2k the system of k inequalities 
xi + X 2 > 1, x^ + Xi> 1,. . . , a; 2 fc-i -I- X 2 k > 1 has 2^ minimal feasible binary 
vectors and only k maximal infeasible binary vectors. 

Let us add finally that if the number of inequalities in o is fixed, then 
\^A,b,c\ can also be polynomially bounded by \I{TA,b,c)\j and accordingly, the 
set of all maximal infeasible integer vectors for m can be generated in quasi- 
polynomial time. In other words, Proposition Q cannot hold for r = const unless 
any problem in iVP can be solved in quasi-polynomial time. Furthermore, for 
systems with fixed number of non-zero coefficients per inequality and bounded 
box size, problem GEN{J^A,b,c X) can be efficiently solved in parallel (see P|). 

We prove Theorem 0 in Section El and then use this theorem in the next 
section to reduce problem GEN{tFA,b,c to a natural generalization of the 
hypergraph dualization problem. Our generalized dualization problem replaces 
hypergraphs by collections of integer vectors in a box. 



Theorem 3. GEN{J-A,b,c df) is polynomial-time reducible to the following 
problem: 

DUAL(C,A,B): Given an integral box C, a family of vectors A Q C, and a 
collection of maximal independent elements B C X{A), either find a new 
maximal independent element x G X{A) \ B, or prove that B = X{A). 

Note that for C = {0,1}”, problem DU AL{C,A,B) turns into the hypergraph 
dualization problem. Other applications of the dualization problem on boxes can 
be found in |2ttill3j . In Section0we extend the hypergraph dualization algorithm 
of El to problem DU AL{C, A, B) and show that the latter problem can be solved 
in quasi-polynomial time: 



Theorem 4. Given two sets A, and B C X{A) in an integral box C = {x G 
Z” I 0 < a; < cj, problem DU AL{C,A,B) can be solved in poly{n, m) 
time, where m= \A\-\- \B\. 

Clearly, Theorem 0 follows from Theorems 0 and 0 The special cases of 
Theorems Hand 0 for Boolean systems x G (0, 1}” can be found in 0. 

The remainder of the paper consists of the proofs of Theorems 0 0and0in 
Sections 0 0 0 respectively. 
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2 Bounding the Number of Maximal Infeasible Vectors 

In this section we prove Theorem 0 We first need some notations and definitions. 

Let C = {a; G Z" | 0 < a; < c} be a box and let / : C — >■ {0, 1} be a discrete 
binary function. The function / is called monotone if f{x) > f{y) whenever 
x>y and x,y gC. We denote by T{f) and F{f) the sets of all true and all false 
vectors of /, i.e., 

T(/) = {xG C\f{x) = 1} = (min[/]) + , F{f) = {x £ C\f{x) = 0} = (max[/])“, 

where min[/] and max[f] are the sets of all minimal true and all maximal false 
vectors of /, respectively. 

Let (T G be a permutation of the coordinates and let x, y be two n- vectors. 
We say that y is a left-shift of x and write y x if the inequalities 



k k 



i=i i=i 



hold for all k = 1, . . . ,n. A discrete binary function / : C — >■ {0,1} is called 
2-monotonic with respect to a if /(y) > f{x) whenever y x and x,y G C. 
Clearly, y > x implies y x for any cr G S„, so that any 2-monotonic function 
is monotone. 

The function / will be called regular if it is 2-monotonic with respect to 
the identity permutation a = (l,2,...,n). Any 2-monotonic function can be 
transformed into a regular one by appropriately re-indexing its variables. To 
simplify notations, we shall state Lemma E below for regular functions, i.e., we 
fix cr = (1, 2, ..., n) in this lemma. 

For a given subset A C C let us denote by A* all the vectors which are left- 
shifts of some vectors of A, i.e., ^* = (y G C | y ^ x for some x G A}. Clearly, 
T{f) = (min[/])* for a regular function / (in fact, the subfamily of right-most 
vectors of min[/] would be enough to use here.) 

Given monotone discrete functions / and g, we call g a regular majorant of 
/, if g{x) > f{x) for all x G C, and g is regular. Clearly, T{g) D (min[/])* must 
hold in this case, and the discrete function h defined by T{h) = (min[/])* is the 
unique minimal regular majorant of /. 

For a vector x G C, and for an index 1 < fc < n, let the vectors and 
be defined by 




for j < k, 
otherwise, 

for j > k, 
otherwise. 



Let us denote by e the n-vector of all I’s, let Cj denote the unit vector, 
j = 1, ..., n, and let p{x) denote the number of positive components of the vector 
X G C. 
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Lemma 1. Given a monotone discrete binary function f : C ^ {0, 1} such that 
/ ^ 0, and a regular majorant g > f , we have the inequality 

|F(g) nmax[/]| < ^ p{x). (5) 

xGmin[/] 



Proof. Let us denote by h the unique minimal regular majorant of /. Then 
we have F{g) fl max[/] C F{h) fl max[/], and hence it is enough to show the 
statement for g = h, i.e. when T{g) = (min[/])*. 

For a vector y G C\{c} let us denote by I = ly the index of the last component 
which is less than c/, i.e., I = max{j | yj < Cj} G {1, . ■ . ,n}. We claim that for 
every y G F(h) fl max[/] there exists an a: G min[/] such that 

y = + {xi - l)ei + ( 6 ) 

where I = ly. To see this claim, first observe that y ^ c because y G F{f) and 
/ ^ 0. Second, for any j with yj < Cj we know that y + Cj G T(/), by the 
definition of a maximal false point. Hence there exists a minimal true- vector 
X G min[/] such that x < y + ei ior I — ly. We must have since 

if Xi < yi for some i < I, then y > x + et — ei F x would hold, i.e. y F x would 
follow, implying y G (min[/])* and yielding a contradiction with y G F(li) = 
C \ (min[/])*. Finally, the definition oil = ly implies that Hence, 

our claim and the equality 0 follow. 

The above claim implies that 

F{h) n max[/] C -I- {xi — T)ei + \ x G min[/], xi > 0}, 

and hence and thus the lemma follow. □ 

Lemma 2. Let f : C ^ {0, 1} be a monotone discrete binary function such that 
/ ^ 0 and 



X G T{f) => aa; = aiXi -I- . . . > /3, (7) 

where a = {ai, ... , a^) is a given real vector and f3 is a real threshold. Then 
I {x G C I ax < f3}C\ max[/] | < ^ P{x). 

rcGmin[/] 



Proof. Suppose that some of the weights ai,... ,an are negative, say a\ < 
0, . . . , Ofe < 0 and > 0. Since ax > P for any x G T{f) and since / is 

monotone, we have x G T{f) => > P — For any x G C we also 

have {x \ ax < P} Q {x \ a^^+^^x < P — Hence it suffices to prove the 

lemma for the non-negative weight vector and the threshold /?— 

In other words, we can assume without loss of generality that the original weight 
vector a is non-negative. 
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Let cr G S” be a permutation such that o.a^ > Oo-a > • ■ • > Qfcr„ > 0. Then 
the threshold function 

/ / 1 if ax > /3, 

^ ^ \ 0 otherwise. 

is 2-monotonic with respect to a. By {2J), we have g > f for all x G C, i.e., g 
majorates /. In addition, F{g) = {x G C \ ax < /?}, and hence Lemma El follows 
from Lemma ^ D 

We are now ready to show inequality dD and finish the proof of Theorem 
El Given a non-empty set X C T A.h,ci consider the monotone discrete function 
/ : C — >■ {0, 1} defined by the condition min[/] = X. Since HJ is monotone, any 
true vector of / also satisfies dU: 

X G T(^f^ (X}^\X\ -j- . . . -j- (Xkn^n ^ 

for all /c = 1, . . . , r. In addition, / ^ 0 because X ^ Thus, by Lemma 0 we 
have the inequalities 

|{x I OfciXi -I- ... -I- aknXn < bk} C\ max[/] | < (8) 

for each /c = 1, Now, from max[/] = X{X) we deduce that 

r 

i{XA,b,c)ni{x) c U{- I OfciXi -I- ... -I- aknXn < bk} n max[/], 

k=l 

and thus and the theorem follow by (EJ. 

3 Generating Minimal Integer Solntions via Integral 
Dualization 

The proof of Theorem 0 has two ingredients. First, we show that given a mono- 
tone system the sets I{J-A,b,c) and J-A,b,c can be jointly enumerated by iter- 
atively solving the dualization problem DU AL{C,A,B) introduced in Theorem 
0 Second, we invoke Theorem0and argue that since the number of maximal in- 
feasible vectors is relatively small, the generation of T A,b,c polynomially reduces 
to the joint generation of I{TA,b,c) and TA,b,c- 

3.1 Joint Generation of Dual Subsets in an Integral Box 

Let T = T A,b,c be the set of minimal integral vectors for m, and consider the 
following problem of jointly generating all points of T and 

GEN(tF,I{J-),A,B): Given two explicitly listed collections AFT and B C 
T-(T), either find a new point in (iF \ ^) U {I{X) \ B), or prove that these 
collections are complete: (A,B) = (T,I{T)). 
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Proposition 2. Problem GEN{T^X{T)^A^B) can be solved in time poly{n, 
\A\, \B\, log||c||oo) + Tduai, where Tduai denotes the time required to solve prob- 
lem DUAL{C,A,B). 

Proof. The reduction is via the following Algorithm ff : 

Step 1. Check whether B C T{A). If there is an x S 6 then x ^ 

because x € B Q X{T). This and the inclusion AQT imply that x ^ A~^ . Since 
X ^ X{A)^ we can find a coordinate j G {1, • ■ • ,n} for which y = x Cj ^ A~^ . 
By the maximality of x in C \ y belongs to and therefore, there must 
exist a, z € E such that z <y. Since z ^ A^, we have z G J-\A, i.e., z is a new 
minimal integral vector in T which can be found in poly{n, \A\, |S|,log ||c||oo) 
time by performing coordinate binary searches on the box {z G Z" | 0 < z < ?/}. 

Step 2 is similar to the previous step: we check whether A C I~^(B), where 
I~^{B) is the set of integral vectors minimal in C\B~ . If A contains an element 
that is not minimal in C \ B~ , we can find a new point in \ B and halt. 

Step 3. Suppose that B C X{A) and A C X~^{B). Then {A,B) = {T,X{E)) 

B = X{A). (To see this, assume that B = T(-4), and suppose on the contrary 
that there is an x G .7^ \ ^. Since x ^ A = X~^{B) and x ^ B~ C X{T)~ , there 
must exist a y G X~^{B) = A C E such that y < x. Hence we get two distinct 
elements x,y G E such that y < x, which contradicts the definition of T . The 
existence of an x G X{T) \ B leads to a similar contradiction.) To check the 
condition B = X{A), we solve problem DU AL{C,A,B). li B ^ X{A), we obtain 
a new point x G X{A) \ B. By (0, either x G , or x G X{T)~ and we can 
decide which of these two cases holds by checking the feasibility of x for (P). In 
the first case, we obtain a new point y G {a:}“ fl {T \ A) by performing binary 
searches on the coordinates of the box {y G 7A \ 0 < y < x}. In the second 
case, a new point in {x}"*" fl {X{E) \ B) can be obtained by searching the box 
{y GlA \ x <y < c}. □ 

Let T G1 C he an arbitrary antichain, i.e., a system of integral vectors such 
that X ^y for any two distinct elements x,y G T. It is easy to see that Algorithm 
J and Proposition |2| can be used for any class of antichains T defined by a 
polynomial-time membership oracle for . 

3.2 Uniformly Dual-Bounded Antichains 

Extending the definition of dual-bounded hypergraphs in 0, we say that (a class 
of antichains) T G_ C uniformly dual-bounded if there exists a polynomial p 
such that, for any nonempty subset A C .7^, we have 

\x{E)nx{x)\<p{\x\). 



Proposition 3. Suppose that T is uniformly dual-bounded and there exists 
a polynomial-time membership oraele for . Then problem GEN{T) is 
polynomial-time reducible to problem DUAL{C,A,B). 
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Proof. Given a set X in J-, we repeatedly run Algorithm until it either pro- 
duces a new element in ^ \ A or proves that X = T generating the entire 
family X{T). By Step 1, as long as Algorithm J outputs elements oiX{T), these 
elements also belong to 1(A), and hence the total number of such elements does 
not exceed p{\X\). □ 

By Theorem 0 the set of minimal integral solutions to any monotone sys- 
tem of linear inequalities is uniformly-dual bounded, and hence Theorem 0 is a 
corollary of Proposition 0 

4 Dualization in Products of Chains 

def 

Let C = Cl X ... X be an integer box defined by the product of n chains 
Ci = [li : Ui] where h,Ui € Z are, respectively, the lower and upper bounds of 
chain Ci. Given an antichain ACC, and an antichain B C 1(A), we say that B 
is dual to A if B — T(A), i.e., B contains all the maximal elements of C \ A"*". 
If C is the unit cube, we obtain the familiar notion of dual hypergraphs, where 
1(A) becomes the complementary set of the transversal hypergraph of A. In this 
section, we show how to extend the hypergraph dualization algorithm of to 
arbitrary systems A of integral vectors in a box C. 

As in 0, we shall analyze the running time of the algorithm in terms of the 

“volume” V = v{A, B) |A| \B\ of the input problem. In general, a given problem 

will be decomposed into a number of subproblems which we solve recursively. 
Since we have assumed that B C 1(A), (0) implies that the following condition 
holds for the original problem and all subsequent subproblems: 

a ^ b, for all a G A, 6 G B. (9) 

Let R{v) = R{v{A,B)) denote the number of subproblems that have to be 
solved in order to solve the original problem, and let m denote |A| -I- \B\, and 

def 

[n] = {!,... , n}. We start with the following proposition that provides the base 
case for recursion. 

Proposition 4. Suppose min{|A|, |-B|} < const, then problem DUAL{C,A,B) 
is solvable in polynomial time. 

Proof. Let us assume without loss of generality that B = {b^, . . . ,b^}, for some 
constant k. For t G [n]^ and i G [n], let /* = {j G [k] \ tj = i}. Then C = A+US” 
if and only if for every t G [n]^ for which 

H Ui, for all i G [n],j G (10) 

there exists an a G A such that 

m < max{b{ -I- 1 I j G if} if I* yf 0, and = k otherwise. (11) 

To see this, assume first that C = A+ U B~ and consider any t G [n]^ such that 
m holds. Let a: G C be defined by taking Xi = max{&^ -I- 1 | j G /*} if I* ^ 0, 
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and Xi = li otherwise. Then x G C\B and hence x G , implying that there is 
an a G ^ satisfying (HD. On the other hand, let us assume that for every t G [n]^ 
satisfying (Hni), there is an a G ^ for which CD) holds. Consider an x G C\B , 
then there must exist, for every j G [fc], a tj G [n], such that xt^ > 5(^ + 1. 
Clearly t = (ti,... ,tk) € [n]^ satisfies dTUl . and therefore, there is an a G .A 
such that Ui < max{6^ + 1 | j S /*} < Xi if If fy 0, and at = k otherwise. This 
gives X G □ 

Remark. Having found an a; G C \ U B~), it is always possible to extend 
it to a maximal point with the same property in O(nmlogm) time as follows. 
Let Qi = {ai — 1 | a G A} U {xi^Ui\, i = 1, . . . ,n, and assume that this list 
is kept in sorted order for each i. For i = 1, . . . ,n, we iterate Xi G- max{z G 
Qi I {x \, . . . , Xi-i,z, Xi+i , . . . , Xn) ^ A^}. Then the resulting point x is maximal 
inC\(.A+U-B-). 

Now given two integral antichains A, B that satisfy the necessary duality 
condition (0), we proceed as follows: 

Step 1. If min{|Al|, \B\} < 2, the duality of A and B can be tested in 0{n^m) 
time using Proposition ^ 

Step 2. For each k G [n]: 

1. if Uk > Uk for some a G A {bk < Ik for some b G B), then a (respectively, 
b) can be clearly discarded from further consideration; 

2. if Ofc < Ik for some a G A {bk > Uk for some b G B), we set ak G- Ik 
(respectively, bk G- Uk). Note that the duality condition (0 continues to hold 
after such replacements. 

Thus we may assume for next steps that A,B CC. 

Step 3. Let a° G G S. By 0, there exists an i G [n], such that a° > b°. 

Assume, with no loss of generality, that i = 1 and set G- [aj : u{\, C” G- : 
a° — 1]. (Alternatively, we may set C” G- [li : b^] and C[ ■(— [b° + 1 : ui].) Define 



Al" = {a G ^ I Oi < aj}, 
B' = {bGB\bi> o?}, 



^' = ^ \ A" 
B” = B\ S', 



_ ffyi 

fo - \A\ ^ 
_ |B"I 
\B\ ■ 



er = 



Observe that > 0 and ef > 0 since a° G A! and b° G B" . 

Denoting by C' = CJ x C 2 x . . . x C„, and C" = C" x C 2 x . . . x the two 

half-boxes of C induced by the above partitioning, it is then easy to see that A 
and B are dual in C if and only if 

A,B' are dual in C' , and (12) 

A” ,B are dual in C". (13) 

Step 4- Define e(v) = l/x(v), where = v = v{A,B). If min{ei^,ef} > 

e{v), we use the decomposition rule given above, which amounts to solving re- 
cursively two subproblems tI3), (CD) of respective volumes: 

v{A,B^) = \Am = 1^1(1 - ef)|S| = (1 - ef)v{A,B) < (1 - e{v))v, 
v{A",B) = \A''\\B\ = (1 - e^)\A\\B\ = (1 - e^)v{A,B) < (1 - e(fy)^;. 
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This gives rise to the recurrence 

R{v) < 1 + R{{1 - ef)v) + R{{1 - e-^)v) < 1 + 2i?((l - e(z;))z;). 

Step 5. Let us now suppose that ef < e{v). In this case, we begin by solving 
subproblem (C3- If A,B' are not dual in C', we get a point x maximal in C \ 
[.4+ U and we are done. Otherwise we claim that 

A",B are dual in C" Va G ^ B" are dual in C"{a), (14) 

where ^ = {a G ^ | Oi < a°}, and C"(a) = C" x [02 : U 2 ] x . . . x [a„ : u„]. 

Proof of JTTP . The forward direction does not use (IT^ . Suppose that there is 
an a; G C"(a) \ [{A!')^ U [B")~] for some a G A, then Xi > for i = 2, . . . , n. If 
X G (B')~ , i.e., X < 6 for some b G S', then by the definition of S', b\ > a°. On 
the other hand, a G A implies that oi < a°. But then, 

(oi, 02, ... , a„) < (Oi, X 2 , . . . , Xn) < (& 1 , 62 , ■ ■ • , &n), 

which contradicts the assumed duality condition (0. This shows that x G C" \ 
[(^")+U(S'US")-]. 

For the other direction, let x G C" \ [(-4")+ U B~]. Since x ^ (S')“ and 

X = (xi , X 2 , . . . , Xn) < y‘= {a°, X2, ■ ■ ■ , Xn), the vector y is also not covered by 

S'. Thus y G C'\ (S')“. We conclude therefore, assuming (TT^ . that y G A^ , i.e., 
there is an a G „4 such that a < y. But this implies that a G A and hence that 

X G C"(a) \ [{A")+ U (S")-] for some a G A. □ 

It follows by m that, once we discover that (d holds, we can reduce the 
solution of subproblem d to solving 1^1 subproblems, each of which has a 
volume of x(|^"|, |S"|) < efx(^, S). Thus we obtain the recurrence 

R{v) < 1 + R{{1 - ef )x) + \A\R{efv) < i?((l - ef )u) + |s(efx), 

where the last inequality follows from |^| < v/3 and v > 9. 

Step 6. Finally, if e^ < e(u) < ef , we solve subproblem II I ,'Sll . and if we discover 
that A" ,B are dual in C", we obtain the following rule, symmetric to (d: 

A, S' are dual in C' Wb G B : Al , S' are dual in C'(b), 

where S = {6 G S | 6 i > a° — 1}, and C'( 6 ) = C( x [^2 : 62 ] x . . . x [In ■ bn]- This 
reduces our original problem into one subproblem of volume < (1 — £^)v, plus 
|S| subproblems, each of volume at most ef^u, thus giving the recurrence 

R{v) < 1 + S((l - e^)v) + \B\R{e^v) < i?((l - e^)v) + 

Using induction on u > 9, it can be shown that the above recurrences imply 
that R{v) < (see [B|). As x(to^) < 2x{m) and v{A,B) < m^, we get 

x(^^) < x(w^) < 2x(m) ^ 2 log m/ log log m. Let us also note that every step 
above can be implemented in at most 0{rt’m) time, independent of the chains 
sizes \Ci\. This establishes the bound stated in Theorem 0 
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Abstract. Integer division has been known since 1986 [4,13,12] to be in 
slightly non-uniform TC°, i.e., computable by polynomial-size, constant 
depth threshold circuits. This has been perhaps the outstanding natural 
problem known to be in a standard circuit complexity class, but not 
known to be in its uniform version. We show that indeed division is in 
uniform TC°. A key step of our proof is the discovery of a first-order 
formula expressing exponentiation modulo any number of polynomial 
size. 



1 Introduction 

The exact complexity of integer division has been harder to pin down than the 
complexities of addition, subtraction, and multiplication. In 1986, Beame, Cook, 
and Hoover showed that iterated multiplication, and thus division, could be per- 
formed by Boolean circuits of logarithmic depth (NC^ circuits) [4]. In 1987, 
Reif showed that these circuits could be implemented as constant depth circuits 
containing threshold gates (TC° circuits) [12,13]. Since then, the remaining is- 
sue has been the complexity of constructing these circuits. Division is the only 
prominent natural problem whose computation uses non-uniform circuits, cir- 
cuits which require a non-trivial amount of computation for their construction. 

The division problem discussed in this paper is the division of two n-bit 
integers, given in binary, yielding their integer quotient, also in binary. A related 
problem is the multiplication of n n-bit integers, computing their product as 
a binary integer. These problems are easily reduced to each other, so that a 
uniform circuit for one yields a uniform circuit for the other. 

In this paper, we construct uniform constant depth circuits for division and 
iterated multiplication. We work within the framework of descriptive complexity, 
and show that there is a first-order formula using majority quantifiers that ex- 
presses division. This implies that there is an FO-uniform TC*^ circuit performing 
division [3]. First-order (FO) uniformity, equivalent to DLOGTIME uniformity, 
is the strongest uniformity requirement found to be generally applicable. A key 
step focuses on the one step of the TC° division computation not previously 
known to be expressible by a first order formula with majority quantifiers (an 

* Supported by NSF grant CCR-9877078. 
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FO(M) formula). This is the problem of finding powers in the finite field Zp, 
the integers modulo a prime, where p has O(logn) bits. We show that there is a 
first-order formula without majority quantifiers computing powers in Zp. Thus 
this subproblem is in FO, and can be computed with uniform AC° circuits. 

2 Definitions 

We will express division as a predicate DIVISION(A, Y, i) which is true if and 
only if bit i of [W/YJ is 1. We denote numbers with n or bits by capital 
letters, and numbers with O(logn) bits by lowercase letters. We also refer to 
numbers with O(logn) bits as small, and those with bits as large. We 

will always note the size of numbers with (logn)*^*^^^ bits explicitly. The iterated 
multiplication problem will be written as the predicate IMULT(yli, . . . , A„, t) 
which is true if bit i of Y\^=i * ranges from 0 to n^, and so has 21ogn 

bits. 

Though the size of the input to division is 2n-|-logn and the input to iterated 
multiplication has size n^+2 log n, we will consider the input size, for all problems 
in this paper, to be n, as the circuit complexity classes and descriptive complexity 
classes we consider are closed under a polynomial change in the input size. 

In this paper we produce simple logical formulas expressing these predicates. 
A problem is in the complexity class FO (first order) if the predicate correspond- 
ing to the decision problem can be expressed by a first order formula interpreted 
over a finite universe, the set of natural numbers 0,...,n. The inputs to the 
problem are encoded as relations over the universe, and are available to be used 
in the formula. The fixed numeric relations < and BIT are also available^. For 
example, the n bits of the input X to DIVISION are represented by the values of 
a unary predicate A() on the elements of the universe: A(l), A(2), . . . ,X(n). An 
bit input can be represented by a binary predicate, so the inputs Ai, . . . , A„ 
to IMULT are represented as a binary predicate A. Short inputs to a problem, 
like i, the index of the result bit queried, may be represented by a constant in 
the range 0, . . . , n, which can also be regardedas a free variable. Since an FO or 
FO(M) formula over the universe 1, . . . ,n^ can be simulated by an equivalent 
formula over the universe 1, . . . ,n, DIVISION and IMULT with inputs X, Y, 
and Ai having bits, encoded by fc-ary relations over 0, . . . n, are in the same 
descriptive complexity class as DIVISION and IMULT with n-bit inputs. 

DIVISION and IMULT are provably not in FO, as parity is FO reducible to 
them, and parity is not in FO [8,9]. They will be shown to be in the class FO(M), 
problems described by first-order logic plus the majority quantifier. The majority 
quantifier (Mx) can appear anywhere that an (3cc) or a (Vx) can appear. The 
formula {Mx)ip(x) is true iff p{j) is true for more than half the values 0 < j < n. 
These quantifiers let us count the number of 1 bits in a string of length n; the 
counting quantifiers (3!z x) are definable in terms of (Mx). These quantifiers are 

^ Following [10], we consider FO to include ordering and BIT. The BIT predicate 
allows us to look at the bits of numbers. BIT(i,a;) is true if bit i of the number x 
written in binary is 1. This is equivalent to having addition and multiplication on 
numbers between 0 and n. 
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analogous to gates with n inputs that output 1 iff at least i of their inputs are 
1, called threshold gates. We see next how an FO(M) formula is equivalent to a 
circuit containing threshold gates. 

A TC° circuit is a constant-depth, polynomial-size circuit with AND, OR, 
NOT, and threshold gates with arbitrary fanin. If the type of each gate and the 
connections between gates can be computed by a deterministic logtime Turing 
machine, or equivalently by an FO formula, then the circuit is FO-uniform. 
The equivalence of FO-uniform TC° circuits and FO(M) formulas is shown by 
Barrington, Immerman, and Straubing in [3]. The Boolean function computed 
by an FO-uniform TC° circuit can be computed by an FO(M) formula, a first 
order formula using majority quantifiers, ordering, and BIT. The converse also 
holds; any FO(M) formula can be turned into a uniform TC° circuit. Here and 
throughout the paper, uniform will mean FO-uniform. 

TC° is contained in the circuit complexity class NC^, which contains all prob- 
lems decided by Boolean circuits containing NOT gates, and AND and OR gates 
with two inputs, with gates, n inputs, and depth O(logn). TC° contains 

the class AC° of constant depth polynomial size circuits without threshold gates. 
FO-uniform AC° circuits are equivalent to FO formulas with only existential and 
universal quantifiers, no majority quantifiers [3] . 



3 Previous Work 



As stated in the introduction, Beame, Cook, and Hoover, et. al. gave NC^ cir- 
cuits deciding DIVISION and IMULT in 1986 [4]. They also gave a polynomial 
time algorithm for constructing the n’th circuit. Reif showed how to convert these 
to constant-depth threshold circuits a year later [13,12]. Immerman and Landau 
then observed that the construction was logspace uniform given the product of 
the first P primes, implying that the full construction was TC^ uniform [11]. 

These circuits were based on finding the remainders of the inputs on division 
by a set of small primes. The value of a number modulo a set of primes uniquely 
determines its value modulo the product of those primes. This is referred to as 
the Chinese remainder representation (CRR). The circuits work by converting 
the inputs to CRR, computing iterated products in that representation, and con- 
verting the output to binary. In the later 1990s, Chiu, Davida, and Litow devised 
new ways of computing in CRR that reduced the complexity of converting from 
CRR into binary [5,6]. These steps allowed them to construct logspace-uniform 
and VC^-uniform TC*^ circuits for division and iterated multiplication. 

Allender and Barrington reinterpreted those results in the framework of de- 
scriptive complexity, and showed that the only difficulty in expressing iterated 
multiplication and division in FO(M) was the difficulty of raising numbers to 
a power modulo a small prime [2]. The current paper completes this effort by 
showing that this power predicate lies in FO. As division is complete for FO(M) 
via FO Turing reductions, it is unlikely that the complexity of division can be 
further reduced. 
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4 Division Reduces to POW 



The key problem examined by this paper is POW, the predicate expressing 
exponentiation modulo a prime. For a small prime p, and small arguments a, r, 
and b, 

FOW{a,r,b,p) aJ' = b (mod p) . 

To be exact, we have a family of problems POWfciogn(a, r, b,p) for A: = 1, 2, . . ., 
where the inputs to POW k log n have size k log n. An input with k log n bits can be 
represented by a fc-tuple of variables taking values in 0, . . . , n. Thus POW^iogn is 
a 4fc-ary numeric relation. Though the inputs have 0(log n) bits, we consider the 
descriptive complexity of this problem as if it had input size n. We ask whether 
this predicate can be represented by FO or FO(M) formulas over the universe 
0, ...,n. 

Allender, Barrington, and the author showed that DIVISION and IMULT 
are in FO(M) if and only if POW is in FO(M) [2]. They did this by showing 
that DIVISION and IMULT are FO-Turing reducible to POW. A version of 
this proof, with additional simplifications, is in the full version of this paper. 
The predicate POW is used to convert inputs from binary to CRR, and to find 
discrete logarithms in the multiplicative group Z* of integers mod p for primes 
p in the CRR basis. 

FO-Turing reducibility in descriptive complexity classes is formally defined 
using generalized quantifiers in [10]. In the case of an FO-Turing reduction to 
POW, we shall not use this full formal definition, but a simpler characterization 
of FO-Turing reducibility to a relation. In the case of POW(a, r, 6,p), which 
could be considered as a primitive numeric relation of arity Ak (if the inputs 
have fclogn bits), we can express FO(M) Turing reducibility to POW by simply 
saying a predicate p is FO(M) Turing reducible to POW if and only if there is 
an FO(M) formula with numeric relations <, BIT, and POW that expresses p. 
This is equivalent to saying ip € FO(M,POW). Clearly, if POW is in FO(M), 
then FO(M,POW)= FO(M). We replace all uses of POW in a formula p with the 
equivalent FO(M) formula. This is all we shall need to use about the reduction 
from DIVISION and IMULT to POW. 



5 POW Is FO-Turing Reducible to IMULTo(iogn) and 
DIVISIONo((logn)2) 

We now show that we can produce an FO formula deciding POW, provided that 
we allow the formula to use the results of certain smaller IMULT and DIVISION 
problems. These problems will have inputs constructed by FO formulas from 
the inputs to POW, or from the outputs of other small IMULT and DIVISION 
problems. This can be characterized as an FO-Turing reduction from POW to 
these smaller versions of IMULT and DIVISION. Later we will show that these 
smaller versions of IMULT and DIVISION are in FO(M), and then show that 
they are in FO. 
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The scaled versions of IMULT and DIVISION have (log inputs. We 

still consider them as having input size n, however, so we shall define them as 

IMULT (log (Ai, . . . , An,j) = 

IMULT(Ai, . . .,An,j) A (Vi)H, < (Vi > (logn)'=)A, = I 

DIVISION(iog„)-c(V,r,i) = DIVISION(V, r, f) A V < a V < . 

Thus we only have to give correct answers to the problems when the inputs 
are small. An FO Turing reduction to these problems is more complicated than 
an FO(M) Turing reduction to POW because the inputs to these problems have 
aj(logn) bits and so must be given as relations, not as first-order variables. We 
shall only point out where these problems are used in our first-order expression 
for POW, and state the implications if we have FO or FO(M) expressions for 
them. 

To show that POW is in FO, we will prove a more general lemma about 
finding powers in groups. This is interesting in its own right, and necessary 
for the extension to finding powers modulo prime power moduli. We consider a 
group to be given in FO if group elements are labeled by elements of the universe 
and the product operation is given by an FO formula. Note that the identity 
element and inverse operation can be defined in FO from the product operation. 
We can also continue to use arithmetic operations on the universe, considered 
as the numbers 0, . . . , n. 

Lemma 1. Finding small powers in any group of order n is FO Turing-reducible 
to finding the product o/logn elements. 

Proof. Suppose we want to find a’’, where a is an element of a group of order n. 
We will compute a set of elements oi, . . . , au and exponents u, ui, . . . , Uk such 
that 

r u u-\ ujc 

a = a 

and Ui < 21ogn, u < 2(logn)^. 

Step 1. We choose a set of A: = o(logn) primes di, . . . , dk, such that di < 21ogn 
and di is relatively prime to n, for all i. We choose them such that n < D = 
dic ?2 • • • dfc < n^- We can do this with a first order formula by choosing the first 
D > n such that D is square-free, D and n are relatively prime, and all prime 
factors of D are less than 2 log n. We can decide, given D, whether a number is 
one of our di or not. To compute the number k from D, and to find our list di as 
a relation between i and di, requires, for each prime po < 21ogn, counting the 
number of primes p dividing D which are less than pq . We can do this using the 
BITSUM predicate, which counts the number of one bits in a log n bit number: 
BITSUM(a;, j/) is true if the binary representation of x contains y ones. This is 
shown to be in FO in [3] . 

Step 2. We calculate as follows: 

First we calculate Ui = n mod di. Compute a~^ using the inverse operation. 
We find a“”* by multiplying n* copies of a~^ together. This is one place where 
our Turing reduction to multiplication of log n group elements is used. 
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We can find by observing that 

f [n/di\ \di [n/di\di n— (n mod di) n—rii n~'^^ 

I Ci J Ci Ci Ci (Jj • 

Observe that there is exactly one group element x such that = a“”b Let d~^ 
be the multiplicative inverse to di mod n, i.e. that did~^ = mn+ 1 for some m. 
Then 

Thus we can find Qi = as the value of x in the expression 

(3x) x'^^ = a-”^ 

We compute x'^^ using multiplication of log n elements. We could not compute 
Q,l"/rfiJ directly as ( 0 “”*)'^^ since d~^ is not necessarily O(logn). 

Step 3. Now we find the exponents m,Mi, . . . ,Uk such that a“a“^ • • • = a’'. 

Since at = 

til tifc ,tti[n/dij) 

and since o’’ = a“a“^ • • • ““L^/'^il) ^ 

k 

u = r — ''^^Uil— \ (mod n) . 

1=1 

Thus, to make the final correction term a“ computable, we must make u as 
small as possible, and so we want to make Ui[n/di\ mod n as close to r as 
possible. To approximate r as a linear combination of \n/di \ , we use the Chinese 
remainder theorem. 

Compute / = \rD/n\. This step requires r to have O(logn) bits. Using the 
Chinese remainder theorem, if we let Di = D /di, and let Ui = fD~^ mod di, 
then 



k k 

'^^UiDi = / (mod D) . Let m be s.t. '^^UiDi = / + mD . 

i=l i=l 

We can calculate Ui in FO, since we can guess the possibilities for D~^ in 
FO. Calculating u from the Ui involves a sum of k small numbers, which, since 
k < logn, is in FO. This, again, uses the fact that BITSUM is in FO. 

We now show that u < (logn)^. We calculate the difference between r and 
J2ui [n/di\: 



I n •^u.n ^ UiTi n 

i=i 1=1 1=1 



i=l 
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The quantity y — [yj is always between 0 and 1, and since n/ D < 1, Ui < 
21ogn, and k < logn, we see that u < 2(logn)^ + 1. Thus we can calculate a“ 
using two rounds of multiplying log n group elements. 



Thus we have described group elements and numbers u, Ui such that 
a“a“^ • • • 0 ^*“ = a’’ and the computation of a“o“^ • • • is FO Turing reducible 
to the product of log n group elements. □ 



Because FO is closed under polynomial change in input size, and the product 
of log(n^) = k log n group elements is FO reducible to the product of log n group 
elements, we have 

Corollary 1. Finding powers in any group of order is FO Turing -reducible 
to finding the product of logn elements. 

Representing a group of order means representing elements as fc-tuples of 
universe elements, and representing the product operation in FO. 

We now apply this to the integers modulo p, where p = 0{n^) is a prime. The 
multiplicative group Z* contains the p— 1 integers 1, . . . ,p— 1, and multiplication 
in this group is clearly first-order definable from multiplication and addition on 
0, . . . , n. If a in POW(a, r, b,p) is zero, then we only need to check that b is zero. 
Otherwise, we find a’’ in the multiplicative group Z*. The product of logn group 
elements can be computed with IMULTfciog„ and DIVISION;. jog 2 so we have 
the main lemma of this section: 



Lemma 2. POW is FO-Turing reducible to IMULTo(\ogn) o.'nd DIVF 
^I^^O{[\og nP) • 



6 DIVISI0N(iog^)O(i) and IMULT^iog^^oci) Are in FO(M) 

Since our end result is that DIVISION and IMULT are in FO(M), it should 
be no surprise that the logarithmically smaller versions DIVISION(jQg„)o(i) and 
IMULT (iQg„)o(i) are in FO(M). We will prove that these smaller versions are in 
FO(M) by reducing them to POWo(iogiogn)) and showing that POWo(iogiogn) 
is in FO. 




Ill 



Division Is in Uniform TC° 

Just as we have introduced scaled versions of IMULT and DIVISION, we use 
a scaled version of POW: 

POWfeiogiog„(a,r,6,p) =POW(a,r,5,p) A a, r, 6, p < 2'= log n 

The FO(M) Turing reduction of IMULT„o(i) to POW = POWo(iogn) 
shown by Allender et. al [2] scales to become an FO(M) Turing reduction of 
IMULT(iQg„)o(i) to POWo(iogiogn)- This can be seen as follows: consider the 
FO(M) reduction on the problem with (log input size, which is a Turing re- 

duction using FO(M) formulas over the universe with (logn)*^*^^) elements to the 
correspondingly scaled version of POW, POWo(iogiogn)- But any FO(M) formula 
over the smaller universe can be simulated by an FO(M) formula over the larger 
universe, so this is an FO(M) reduction from IMULT (iQg„) 0 (i) to POWo(iogiogn)- 
Showing that POWo(iogiogn) is in FO can be done directly. Suppose the 
modulus p, the exponent r, and the base a all have fewer than fcloglogn bits. 
The numbers at = i modp, with i ranging from 0 to fcloglogn can be 

guessed simultaneously, since there are k log log n of them, each with k log log n 
bits. An existential choice of a number x from 0 to n — 1 can be thought of as 
a non-deterministic simultaneous guess of log n bits, so we can certainly simul- 
taneously guess (fcloglogn)^ bits. There is exactly one choice of the numbers 
oi, . . . , Ofeiogiogn such that the following conditions hold: 

Ofciogiogn = = (jO ^ ^ (mod p) , 

where Vi is bit i of r. 

Extracting the numbers Oi out of our log n bit choice x and checking that 
they meet the above conditions can be done with an FO formula. Extracting uq 
gives us o’’ mod p. 

Thus we have concluded that POWo(iogiogn) is in FO. Since we have an 
FO(M) Turing reduction from IMULT(jQg„)o(i) and DIVISIONpQg„)o(i) to 
POWo(iogiogn)) we can conclude 

Theorem 1. IMULT and DIVISION are in FO(M). 

7 DIVISION and IMULT Are in FO(M) 

Since we have an FO Turing reduction from POW to IMULTo(iog„) and 
DIVISIONo(iQg 2 we can conclude that we have an FO(M) formula for POW. 
Finally, using the FO(M) Turing reduction from IMULT and DIVISION to 
POW, we arrive at our main result. 

Theorem 2. Iterated multiplication of n n-bit numbers and division of 2 n-bit 
numbers is in FO(M). 

By the equivalence of FO(M) to FO-uniform TC°, we have 

Corollary 2. Iterated multiplication of n n-bit numbers and division of 2 n-bit 
numbers is in FO-uniform TC°. 

As both of these classes are closed under polynomial change in the input size, 
these results also hold for inputs with bits. 
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8 POW Is in FO 

An additional result of the theorem that IMULT and DIVISION are in FO(M), 
is that IMULT(iQg„)o(i) and DIVISION(iQg„)o(i) are in FO. This is because any 
FO(M) formula over a universe 0, . . . ,logn has an equivalent FO formula over 
the universe 0, ... ,n. 

The fact that FO is closed under the introduction of counting quantifiers with 
polylogarithmic bounds is established in [1,7]. Since IMULT(jQg„)o(i) is equiva- 
lent to IMULT with input size (logn)*^*-^^ it is expressed by an FO(M) formula 
over 0, . . . , log n. Therefore, IMULT(jog„)o(i) is expressed by an FO formula, and 
similarly DIVISION(jog„)o(i) is in FO, and we have 

Theorem 3. IMULT and DIVISION are in FO. 

This theorem gives us a tight bound on the size of cases of IMULT that are 
in FO. Since we know that PARITY is in FO iff /(n) = (logn)*^*^^^ from 
Hastad [9], and PARITY is easily FO many-one reducible to multiplication of 
two numbers, which is FO many-one reducible to IMULT of the same size, we 
can conclude that IMULTy(„) is in FO iff /(n) = (logn)*^^^^. 

Since our proof that POW was in FO(M) included an FO Turing reduction 
from POW to DIVISIONo((iogn) 2 ) and IMULTo(iog„), and we now have FO for- 
mulas expressing DIVISIONoqiog„) 2 ) and IMULTopog^), we can now conclude 
that POW is in FO. Since the restriction that the inputs to POW have O(logn) 
bits is equivalent to requiring that the inputs be in the range 0, . . . , n, we have 
our second main result. 

Theorem 4. The predicate POW(a,r,b,p) which is true iffa^ = b (mod p), 
with p prime, can he expressed by an FO formula over the universe 0, . . . , n, if 
a, r,b,p < n. 

This result can be extended to exponentiation modulo any small number n, 
not just modulo a prime. We can see that the equation 

o’" = 5 (mod n) 

is true if and only if it is true modulo all the prime power factors of n: 
al = b (mod p*) Vp*|n . 

We can show that for a relatively prime to p*, a is in the group Z*i, and the 
above proof can be applied. If p divides a, then if r > logn, a’’ = 0 (mod p*). 
If r < logn, then IMULT (iQg„)o(i) can be applied. Since the prime power factors 
of a small number n can be found in FO, we have 

Corollary 3. The predicate = b (mod n), with the inputs written in unary, 
is in FO. 

Finally, note that the property that any predicate expressible in FO over the 
universe 0, . . . , n^ is expressible in FO over 0, . . . , n lets us conclude that the 
predicate al = b{ mod n) is in FO if the inputs have O(logn) bits, but not that 
it is in FO with inputs of (logn)*^*^^^ bits. This is different from the results we 
have for IMULT(jQg„)o(i) and DIVISION(jQg„)o(i). 
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9 Conclusions 

Our main theorem states that division and iterated multiplication are in fully 
uniform TC°. This is significant on its own and also because it eliminates the 
most important example of a problem known to be in a circuit complexity class, 
but not known to be in the corresponding uniform complexity class. 

We also proved that exponentiation modulo a number is in FO when the 
inputs have O(logn) bits. This result was quite unexpected, since the problem 
was previously not even known to be in FO(M). It remains unknown if exponen- 
tiation modulo a number with (logn)*^^^^ bits is in FO, or even in FO(M). 

Finally, we have found a tight bound on the size of division and iterated 
multiplication problems that are in FO. We now know that these problems are 
in FO if and only if their inputs have (logn)*^^^^ bits. Instances of the problems 
with larger inputs are known not to be in FO. 

Acknowledgments. These results were found while working on [2] with Eric 
Allender and David Mix Barrington, who generously urged me to publish them 
separately. 
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Abstract. In this paper we investigate automated methods for exter- 
nalizing internal memory data structures. We consider a class of balanced 
trees that we call weight-balanced partitioning trees (or wp-trees) for in- 
dexing a set of points in Well-known examples of wp-trees include fed- 
trees, BBD-trees, pseudo-quad-trees, and BAR-trees. Given an efficient 
external wp-tree construction algorithm, we present a general framework 
for automatically obtaining a dynamic external data structure. Using this 
framework together with a new general construction (bulk loading) tech- 
nique of independent interest, we obtain data structures with guaranteed 
good update performance in terms of I/O transfers. Our approach gives 
considerably improved construction and update I/O bounds for e.g. ex- 
ternal fed-trees and BBD-trees. 



1 Introduction 

Both in the database and algorithm communities, much attention has recently 
been given to the development of I/O-efficient external data structures for in- 
dexing point data. A large number of external structures have been developed, 
reflecting the many different requirements put on such structures; small size, ef- 
ficient query and update bounds, capability of answering a wide range of queries 
(mainly range and proximity queries), and simplicity. See recent surveys |2I14I 
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E3. The proposed data structures can roughly be divided into two classes, namely 
simple and practical (often heuristics based) structures, for which worst-case 
query performance guarantees can only be given in the static case (if at all), and 
theoretically optimal but usually complicated dynamic structures. The first class 
of structures are often external versions of well-known simple internal memory 
structures. 

In this paper, we develop a general mechanism for obtaining efficient exter- 
nal data structures from a general class of simple internal memory structures, 
such that the external structures are efficient in the dynamic case. Part of our 
result is a new general index construction (bulk loading) technique which is of 
independent interest. 

1.1 Computational Model and Previous Results 

In this paper we analyze data structures in the standard two-level external mem- 
ory model defined by the following parameters HUE): N, the number of input 
elements, M, the number of elements that fit in main memory, and B, the num- 
ber of elements that fit in one disk block, where N ^ M and 1 < B < M/2. 
One I/O operation (or simply I/O) in this model consists of reading one block of 
contiguous elements from disk into main memory or writing one block from main 
memory to disk. The measure of performance of an algorithm or data structure is 
the number of I/O operations it performs and the maximum disk space (blocks) 
it uses. For notational simplicity, we use n = N/B and m = M/B to denote the 
input size and memory size in units of data blocks. 

Aggarwal and Vitter Q developed algorithms for sorting a set of N elements 
in external memory in optimal 0(nlog^n) I/Os. Subsequently, I/O-efhcient 
algorithms have been developed for large number of problems. Recently, many 
provably efficient (and often optimal) external data structures have also been de- 
veloped. Ideally, an external data structure should use linear space, 0{n) blocks, 
and answer a query in 0(log^ N+K/B) I/Os, where K is the number of elements 
reported by the query. These bounds are obtained by the B-tree data structure 
for one-dimensional range searching For two-dimensional range search- 

ing, 0{^Jn + K/B) is the best obtainable query bound with linear space 1261 . 
Structures that use more than linear space are often infeasible in practical ap- 
plications. Below we discuss the known external memory data structures most 
relevant to this paper. See |77l^ for complete surveys of known results. 

One main challenge in the design of external data structures is obtaining 
good query performance in a dynamic environment. Early structures, such as the 
grid file 123, the various quad-trees IZPES!, and the /cdB-tree were poorly 
equipped to handle updates. Later structures tried to employ various (heuristic) 
techniques to preserve the query performance and space usage under updates. 
They include the LSD-tree m, the buddy tree 123!, the hB-tree uni, and R-tree 
variants (see and the references therein). These data structures are often 
the methods of choice in practical applications because they use linear space 
and reportedly perform well in practice. However, in a dynamic environment, 
the query time is high in the worst-case. The hB-tree (or holey brick tree). 



A Framework for Index Bulk Loading and Dynamization 117 



for example, is based on the statically query-efficient fcdB-tree, which combines 
the spatial query capabilities of the fcd-tree 0 with the I/O-efficiency of the 
B-tree. While nodes in a fcdB-tree represent rectangular regions of the space, 
nodes in an hB-tree represent so-called “holey bricks,” or rectangles from which 
smaller rectangles have been cut out. This allows for the underlying B-tree to be 
maintained during updates (insertions). Unfortunately, a similar claim cannot 
be made about the underlying fcd-tree, and thus good query-efficiency cannot be 
maintained. 

Recently, a number of theoretical worst-case efficient dynamic external data 
structures have been developed. The cross-tree HB| and the 0-tree irz!> for exam- 
ple, both use linear-space, answer range queries in the optimal number of I/Os, 
and they can be updated I/O-efficiently. However, their practical efficiency has 
not been investigated, probably because a theoretical analysis shows that their 
average query performance is close to the worst-case performance. By contrast, 
the average-case performance of the fcd-tree (and the structures based on it) is 
much better than the worst-case performance PS]- Other linear-space and query 
and update optimal external data structures have been designed for special types 
of range queries, like 2- or 3-sided two-dimensional range queries and halfspace 
range queries (see e.g. |2l4j 1 . The practical efficiency of these structures still has 
to be established. 

In the database literature, the term bulk loading is often used to refer to 
the process of constructing an external data structure. Since bulk loading an 
index using repeated insertion is often highly non-efficient j^, the development 
of specialized bulk loading algorithms has received a lot of attention recently. 
Most work on bulk loading has concentrated on the R-tree (see mm and the 
references therein). 

1.2 Our Results 

In Section El of this paper, we define a class of linear-space trees for indexing 
a set of points in These so-called wp-trees generalize known internal mem- 
ory data structures like fcd-trees |^, pseudo-quad-trees EP, BBD-trees 0, and 
BAR trees ESI. We also show how a wp-tree can be efficiently mapped to ex- 
ternal memory, that is, how it can be stored in external memory using 0{n) 
blocks such that a root-to-leaf path can be traversed I/O-efficiently. In Sec- 
tional we then design a general technique for bulk loading wp-trees. Using this 
technique we obtain the first I/O-optimal bulk loading algorithms for fcd-trees, 
pseudo-quad-trees, BBD-trees and BAR-trees. Our algorithms use 0(n log,„n) 
I/Os while previously known algorithms use at least 12(nlog2n) I/Os. Finally, 
in Section 0 we describe several techniques for making a wp-tree dynamic. Our 
techniques are based on dynamization methods developed for internal mem- 
ory (partial rebuilding and the logarithmic method) but adapted for external 
memory. Together with our external bulk loading technique, they allow us to 
obtain provably I/O-efficient dynamic versions of structures like the fcd-trees, 
pseudo-quad-trees, BBD-trees, and BAR-trees. Previously, no such structures 
were known. 
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2 The wp-Tree Framework 

In this section, we introduce wp-trees and show how they can mapped to external 
memory. To simplify the presentation, we discuss our results in They can all 
easily be generalized to higher dimensions. 

Definition 1 A (P,S,k) weight-balanced partitioning tree (or wp-tree) on a set 
S of N points in satisfies the following constraints: 

1. Each node v corresponds to a region in called the extent of v. The 
extent of the root node is R.^; 

2. Each non-leaf node v has j3 >2 children corresponding to a partition of 
into (3 disjoint regions; 

3. Each leaf node v stores exactly one point p from S inside r^; 

4. Let w{v) be the weight of node v, defined as the number of data points 
stored in the subtree rooted at v, and let be the K’th ancestor of v. 
Then w{v) < 6w{v^^'^) for all nodes v and 

The wp-tree generalizes a number of internal memory data structures used to 
index point data sets: fcd-trees 0, pseudo-quad-trees E3, BBD-trees 0, and 
BAR-trees m are all wp-trees. 

Condition 4 (the weight condition) insures that wp-trees are balanced; only 
a constant number of partition steps (k) is required to obtain regions containing 
a fraction (5) of the points. 

Lemma 1. The height of a wp-tree is at most «;(log]^/^ -|- 1) — 1. 

We want to store a wp-tree on disk using 0{n) disk blocks so that a root-to- 
leaf path can be traversed I/O-efficiently. Starting with the root v we fill disk 
blocks with the subtree obtained by performing a breadth-first search traversal 
from V until we have traversed at most B nodes. We recursively block the tree 
starting in the leaves of this subtree. The blocked wp-tree obtained in this way 
can be viewed as a fanout 0{B) tree with each disk block corresponding to a 
node. We call these nodes block nodes in order to distinguish them from wp- 
tree nodes. The leaf block nodes of the blocked wp-tree are potentially underfull 
(contain less than 0{B) wp-tree nodes), and thus 0{N) blocks are needed to 
block the tree in the worst case. To alleviate this problem, we let certain block 
nodes share the same disk block. More precisely, if ■(; is a non-leaf block node, we 
reorganize all v’s children that are leaf block nodes, such that at most one disk 
block is non-full. This way we only use 0{n) disk blocks. Since each non-leaf 
block node contains a subtree of height 0(log2 B) we obtain the following. 

Lemma 2. A blocked wp-tree T is a multi-way tree of height 0(log^ N). It can 
be stored using 0{n) blocks. 

2.1 The Restricted wp-Tree 

The wp-tree definition emphasizes the structure of the tree more than the geom- 
etry of the partitioning. The dynamization methods we will discuss in Section 0 
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can be applied to any wp-tree. However, without specifying the properties of 
the partitioning, we cannot quantify the update and query I/O-bounds obtained 
using these methods. Therefore we now restrict the definition of a wp-tree by 
adding geometric constraints on the extent of a node and the partitioning meth- 
ods used. On the one hand the resulting class of restricted wp-trees is general 
enough to encompass many interesting data structures, and on the other hand 
it is restrictive enough to allow us to prove general bulk loading, update, and 
query bounds. 

Definition 2 A restricted {j3, 5, k) wp-tree is a (/3, 6, k) wp-tree in which each 
node V satisfies the following constraints: 

1. The extent of v is the region lying between two convex polygons; = 
bo \bi- The inner polygon bj must be completely inside the outer polygon 
bo, and the orientations of edges forming 6/ and bo must be taken from a 
constant set of directions D. 

2. The extents of the (3 children of v are obtained from r„ by applying the 
following cut rules a constant number of times: 

a) A geometric cut {£). A geometric cut is a line i along a direction e € D 
not intersecting 6/. 

b) A rank cut (e,a). A rank cut is a line £ along a direction e € D. Let £' 
be the line along e such that aw{v) of the w{v) points corresponding to 
V are to the left of £'. Then £ is the closest line to £' not intersecting the 
interior of bj. 

c) A rectangle cut. A rectangle cut can be applied to v only if bj and bo are 
both fat rectangles (i.e., the aspect ratio is at most 3) and 2bj C bo- A 
rectangle cut is a fat rectangle b' such that bj C b' C bo and both b' \ bj 
and bo \ b' contain at most 2w{v)/3 points. 

2.2 Examples of Restricted wp- Trees 

Like wp-trees, restricted wp-trees generalize internal memory data structures 
like fcd-trees, BBD-trees, pseudo-quad-trees, and BAR-trees. Below we further 
discuss /cd-trees and BBD-trees. In the full paper we show how pseudo-quad-trees 
and BAR-trees are also captured by the restricted wp-tree definition. 

The fcd-tree. Introduced by Bentley 0, the fcd-tree is a classical structure 
for answering range (or window) queries. It is a binary tree that represents a 
recursive decomposition of the space by means of hyperplanes orthogonal to the 
coordinate axes. In K.^ the partition is by axes-orthogonal lines. Each partition 
line divides the point-set into two equal sized subsets. On even levels of the tree 
the line is orthogonal to the x-axis, while on odd levels it is orthogonal to the 
y-axis. These partitions are rank cuts (e, 1/2), where e is orthogonal to the x- or 
y-axis. Thus the fcd-tree is a restricted (2, 1/2, 1) wp-tree. 

The BBD-tree. The balanced box decomposition tree, or BBD-tree, was intro- 
duced by Arya et al for answering approximate nearest-neighbor queries. Like 
the fcd-tree, the BBD-tree is a binary tree representing a recursive decomposition 
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Fig. 1. BBD-tree partitions, (a) Split node, (b) Shrink node. 



of the space. The region associated with a BBD-tree node is the set theoretic 
difference of two fat rectangles, 6 / and bo (with bj included in bo)- More pre- 
cisely, a BBD-tree consists of two types of nodes: split nodes and shrink nodes. 
In a split node, the partition is done using an axis-orthogonal line that cuts the 
longest side of bo so that the resulting rectangles are fat and bj lies entirely 
inside one of them — refer to Figure da). In a shrink node v, the partition is 
done using a box rather than a line. This box b lies inside bo, contains bj, and 
determines the extent of the two children: b\bj is the extent of the inner child 
and bo \ b is the extent of the outer child — refer to Figure db). While split 
nodes reduce the geometric size, the box b used in shrink nodes is chosen so as 
to reduce the number of points by a factor of 1.5. By alternating split nodes 
and shrink nodes, both the geometric size and the number of points associated 
with each node decrease by a constant factor as we descend a constant number 
of levels in the BBD-tree (see 0 for details) . It is easy to see that the split node 
uses a geometric cut and the shrink node uses a rectangle cut. In the full paper 
we show that a BBD-tree is a restricted (2, 2/3, 3) wp-tree. 

3 Bulk Loading Restricted wp-Trees 

In this section we describe an optimal algorithm for bulk loading (constructing) 
a blocked restricted wp-tree. 

It is natural to bulk load a wp-tree using a top-down approach. For example, 
to construct a A:d-tree on N points in we first find the point with the median 
x-coordinate in 0{n) I/Os PJ. We then distribute the points into two sets based 
on this point and proceed recursively in each set, alternating between using 
the median x-coordinate and y-coordinate to define the distribution. This way 
each level of the wp-tree is constructed in a linear number of I/Os, so in total 
we use 0 (ulog 2 n) I/Os to bulk load the tree. This is a factor of log 2 m larger 
than the optimal 0(n log^ n) bound (the sorting bound). Intuitively, we need to 
construct 6 >(log 2 m) levels of the wp-tree in a linear number of I/Os — instead of 
just one — in order to obtain this bound. Doing so seems difficult because of the 
way the points are alternately split by x- and y-coordinates. Nevertheless, below 
we show how to bulk load a blocked restricted wp-tree, and thus a A:d-tree, in 
0{nlog^n) I/Os. 
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To simplify the presentation, we present our restricted wp-tree bulk load- 
ing algorithm only for the case where /3 = 2 and where D contains the two 
directions orthogonal to the coordinate axes. The details of the general algo- 
rithm will be given in the full paper. Let S' be a set of N points in The 
first step in constructing a blocked wp-tree for S is to sort the N points twice: 
once according to their ^-coordinate, and once according to their y-coordinate. 
Call the resulting ordered sets Sx and Sy, respectively. Next a recursive proce- 
dure BulkJoad is called on Sx and Sy. Bulk.load builds a subtree of height 
0(log2 m) in each recursive call. The main idea in the algorithm is to impose a 
grid on the set of input points and count the number of points in each grid cell. 
The grid counts allow us to compute partitions without reading all the points. 
More precisely, BulkJoad starts by dividing the current region (initially M^) 
into t = 6*(min{m, V^M}) vertical slabs and t horizontal slabs, each containing 
N/t points — refer to Figure El( a). These slabs form a. t x t grid. Note that the 
grid size is at most M, and thus the grid can fit into internal memory. The 
number of points in each grid cell is then computed and stored in a matrix A 
in main memory. All three types of cuts can now be computed efficiently using 
A. A rank cut (e, a) for a node v, for example, is computed by first finding the 
slab Ei~ along e containing the cutting line. This can be done without perform- 
ing I/Os. The exact cutting line can then be computed in 0{N/ (Bt)) I/Os by 
scanning the points in Ek- After a subtree T of height 0(log2 t) is built, Sx and 
Sy are distributed into t sets each, corresponding to the leaves of T and the rest 
of the tree is built by calling BulkJoad recursively. The detailed Bulk_load 
procedure is given below. 

procedure BulkJoad(S'a,, Sy,V) 

1. Divide Sx into t sets, corresponding to t vertical slabs Xi, . . . ,Xt, each containing 
\Sx\/t points. Store the t + 1 boundary ^-coordinates in memory. 

2. Divide Sy into t sets, corresponding to t horizontal slabs Yi, . . . ,Yt, each contain- 
ing \Sy\/t points. Store the t + 1 boundary ^-coordinates in memory. 

3. The vertical and horizontal slabs form a grid. Let Cij be the set of points in the 
grid cell formed at the intersection of the ith horizontal slab and the jth vertical 
slab. Create a t x t matrix A in memory. Scan Sx and compute the grid cell 
counts: Aij = \Ci,j\, 1 < i,j <t. 

4. Let u=v. 

5. a) If u is partitioned using a geometric cut orthogonal to the a;-axis, determine 

the slab Xk containing the cut line £ using the boundary a;-coordinates. Next 
scan Xk and, for each cell Cj^k, 1 < J compute the counts of “subcells” 
C^k C^k obtained by splitting cell Cj^k at £ — refer to Figure 0(b). Store 
these counts in main memory by splitting the matrix A into A"' and A^ , 
containing the first k columns and the last {t—k+1) columns of A, respectively 
(column k from matrix A appears in both A"' and A^). Then let J/'j, = \Cfk\ 
and 1 < i < Go to 5.(d). 

b) If u is partitioned using a rank cut orthogonal to the a;-axis, first determine 
the slab Xk containing the cut line £ using A, then scan Xk to determine the 
exact position of the cut line. Next split A into A"' and A^ as above. A cut 
orthogonal to the j/-axis is handled similarly. Go to 5.(d). 
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Fig. 2. Finding the median using the grid cells, (a) Slab Xk contains N/t points, 
(b) A"' and are computed by splitting Xk along i. 



c) If u is partitioned using a rectangle cut, use the following algorithm to deter- 
mine the sides of b' . Let € be a line orthogonal to the longest side of bo that 
cuts bo into two fat rectangles and does not intersect bi. Using only the grid 
cell counts, decide whether any of the two new regions contains more than 
2w{u)/3 points. If this is the case, then repeat the process in that region. 
Otherwise, the region with the largest number of points becomes b' . Scan the 
(up to) four slabs that contain the sides of b' and compute the counts of the 
“subcells”. These counts will be stored in A^ , the cell count matrix for b'\bi, 
and A^ , the cell count matrix for bo \b' . Go to 5.(d). 

d) Create a new wp-tree node for each of the two regions constructed. For each 
of these two nodes, determine its partition by repeating step 5, in which the 
role of A is played by A"' or A^ . Stop when reaching level logj t. 

6. Scan Sx and Sy and distribute the N points into t pairs of sets (Sx,Sl), corre- 
sponding to the t leaves Vi of T. 

7. For each pair of sets {Sx,SD do the following. If (S'*, S'*) fits in memory, then 
construct the remaining wp-tree nodes. Otherwise, recursively call BulkJoad on 
{Si, SI, Vi). 

Theorem 1. A blocked restricted wp-tree can he bulk loaded in O(nlog^n) 
I/Os. 

Proof. First note that sorting the points takes 0(nlog„n) I/Os. Once sorted, 
the points are kept sorted throughout the recursive calls to the Bulk.load pro- 
cedure. Next note that the choice of t = = 0{^/m) means 

that the original t x t count matrix A fits in memory. In fact, since each of the 
2^°®2 * = t nodes built in one call to Bulk.load adds at most t counts, all count 
matrices produced during one such call fit in memory. 

Now consider one call to BulkJoad. Steps 1, 2 and 3 of BulkJoad are linear 
scans of the input sets Sx and Sy using 0{n) I/Os. Step 6 can also be performed 
in 0{n) I/Os since Sx and Sy are distributed into t = 0(min{m, VM}) = 0{m) 
sets (which means that one block for each of the sets can be maintained in 
memory during the distribution). Step 5 (recursively) computes a subtree of 
height log 2 t, using a different algorithm for each of the three partition types. 
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A geometric or rank cut (Step 5. (a) or 5.(b)) can be computed in 0{\Sx\/t) 
I/Os since slab Xk is scanned at most three times. Similarly, a rectangle cut 
(Step 5.(c)) can also be computed in 0{\Sx\/t) I/Os. The details of this argument 
will be given in the full paper. It can also be proven that a rectangle cut always 
exists |7]. Summing up over the ‘ = 0{t) nodes built, we obtain that Step 5 
can be performed in 0{n) I/Os. 

Since a subtree of height 6 >(log 2 t)= 0 (log 2 m) can be built in a linear number 
of I/Os (one call to BulkJoad), the cost of building the entire blocked restricted 
wp-tree is 0(n^^) = O(nlog^n) I/Os. 



Corollary 1. A kd-tree, BBD-tree, BAR-tree or pseudo-quad-tree can be bulk 
loaded in 0(nlog„n) I/Os. 



4 The Dynamization Framework 

In this section we present a framework for making wp-trees dynamic. We present 
three methods: the first one takes advantage of the weight balancing property of 
wp-trees and uses partial rebuilding to maintain the tree balanced [012 1 j . and the 
other two methods are based on the so-called logarithmic method [ I ( 12 1 j . All 
three methods take advantage of the improved bulk loading bounds obtained 
in the previous section. While the methods are not new, we show how their 
application to blocked restricted wp-trees produces new dynamic data structures 
for indexing points in that are competitive with or better than existing data 
structures in terms of I/O performance. The choice of method for a given data 
structure depends on its update and query bounds as well as the application the 
external structure is to be used in. 



4.1 Partial Rebuilding 

In the definition of a (/3,(5oi^) wp-tree, the weight condition is satisfied by any 
5 > i5o. This method of relaxing the weight condition allows us to perform 
updates with good amortized complexity. A node v is said to be out of balance 
if there is another node u such that = v and w{u) > Sw{v). In other words, 
a node is out of balance if one of its descendants is too heavy. A node v is said 
to be perfectly balanced if all nodes u such that = v satisfy w{u) < Sow{v). 

In order to allow dynamic updates on a blocked wp-tree, we employ a partial 
rebuilding technique, used by Overmars m to dynamically maintain quad-trees 
and fcd-trees balanced, and first adapted to external memory by Arge and Vit- 
ter |S|. When inserting a new point into the data structure, we first insert it in 
the appropriate place among the leaves, and then we check for nodes on the path 
to the root that are out of balance. If v is the highest such node, we rebuild the 
whole subtree rooted at v into a perfectly balanced tree. In the full paper we 
prove the following. 
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Theorem 2. Let T he a bloeked restricted wp-tree on N points. We can in- 
sert points into T in O (g(log„ n)(log 2 n) + log^ n) I/Os, amortized, and delete 
points from T in 0(log^ n) I/Os, worst case. Point queries take 0(log^ n) I/Os, 
worst case. 

As n goes to infinity, the first additive term dominates the insertion bound. In 
practice, however, we expect the behavior to be consistent with the second term, 
0{logg n), because the value of B is in the thousands, thus cancelling the effect 
of the log 2 n factor in the first term for all practical values of n. 

4.2 Logarithmic Methods 

The main idea in the logarithmic method |TnET| is to partition the set of input 
objects into log 2 N subsets of increasing size 2®, and build a perfectly balanced 
data structure for each of these subsets. Queries are performed by querying 
each of the log 2 N structures and combining the answers. Insertion is performed 
by finding the first empty structure %, discarding all structures Tj, 0 < j < i, 
and building % from the new object and all the objects previously stored in Tj, 
0 ^ J < b One can adapt the method to external memory by letting the ith 
subset contain either 2* blocks of points or points. We call the two resulting 
methods the logarithmic method in base 2 and the logarithmic method in base B, 
respectively. 

Logarithmic method in base 2. As mentioned, the ith subset contains 2* 
blocks, or B ■ 2^ points, 0 < i < log 2 n. Queries are performed by combining the 
answers from the log 2 n structures. Insertions are performed as in the internal 
memory case, but we need to maintain a block in internal memory. All insertions 
go into this block until the block is full, at which time the rebuilding is performed 
using all points in the block. In the full paper we prove the following. 

Theorem 3. A forest of perfectly balanced blocked restricted wp-trees for index- 
ing N points can be maintained such that insertions take O (;g(log^ n)(log 2 n)) 
I/Os, amortized, deletions take 0((log^ n)(log 2 n)) I/Os, worst case, and point 
queries take 0((logg n)(log 2 n)) I/Os, worst case. 

Note that, for realistic values of n, m and B, we need less than one I/O, amor- 
tized, to insert a point This should be compared to the (at least) 0(log^ n) used 
in the partial rebuilding method. However, the deletion and point query bounds 
of this method are worse than the bounds obtained using partial rebuilding. 
Logarithmic method in base B. Arge and Vahrenhold used the logarithmic 
method in base B to obtain an I/O-efficient solution to the dynamic point loca- 
tion problem |^. Following closely the ideas of Arge and Vahrenhold, we obtain 
the following. 

Theorem 4. A forest of perfectly balanced blocked restricted wp-trees for index- 
ing N points can be maintained such that insertions take O ((log^ n)(log 3 n)) 
I/Os, amortized, deletions take 0(log^ n) I/Os, amortized, and point queries 
take 0(log% n) I/Os, worst case. 
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The insertion bound of the base B method is a factor of . ^ „ worse than 
the bound obtained using the base 2 method. The deletion bound, however, is 
improved by a factor of log 2 n. 

4.3 Applications 

We now briefly state the results we obtain when using the three dynamization 
methods on our two running examples, fcd-trees and BBD-trees. 

The /cd-tree. In the full paper we show how we can exploit a property of the 
/cd-tree partitioning method to obtain worst-case bounds on the number of I/Os 
needed to perform a range query. We obtain the following. 

Theorem 5. Using partial rebuilding, a dynamic external kd-tree can be de- 
signed, which answers range queries in 0{\/N j^fB K/B) I/Os in 
the worst case, where K is the number of points reported. Each insertion 
takes O (^(log„,^ n)(log 2 n) -k log^ n) I/Os, amortized, and each deletion takes 
0(loggn) I/Os, worst case. Using the logarithmic method in base 2 (or in base 
B), a structure with an 0{y/n-\-K/B) worst-case range query bound can be de- 
signed. In this case insertions take O (;g(log^ n)(log 2 n)) I/Os, amortized (or 
0((log,^ n)(log^ n)) I/Os, amortized), and deletions take O ((log^ n)(log 2 n)) 
I/Os, worst case (or 0{logg n) I/Os, amortized). 

Using the logarithmic methods, the query bound of the dynamic structure is 
the same as the bound for the static structure, although a logarithmic number 
of trees are queried in the worst case. This is true in general for a structure 
with polynomial query bound, because the cost to search each successive struc- 
ture is geometrically decreasing. If the query bound on the static structure is 
poly logarithmic (as in our next example), the bound on the dynamic structure 
increases. 

The BBD-tree. The BBD-tree can be used to answer (1 -k e)-approximate 
nearest neighbor queries 0. Using our dynamization methods we obtain the 
following. 

Theorem 6. Using partial rebuilding a dynamic external BBD-tree can 
be designed, which answers a (1 -k e)- approximate nearest neighbor query 
in Qbbd{N) = 0{c{e){log^n){log2n)/B -k log^n) I/Os, where c(e) = 
2[i+^r e- Insertions take O (;g(log^ n)(log 2 n) -k logg n) I/Os, amor- 
tized, and deletions take 0{logg n) I/Os, worst case. Using the logarithmic 
method in base 2 (or in base B), the query bound increases to log 2 n 

(or QBBD{N)\og^n). Insertions take O (^(log,„ n)(log 2 n)) I/Os, (or 
0((log„^ n)(log^ n)) I/Os), amortized, and deletions take O ((log^ n)(log 2 n)) 
I/Os, worst case (or Oifiog^n) I/Os, amortized). 
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Abstract. This paper formulates and investigates the question of 
whether a given algorithm can be coded in a way efficiently portable 
across machines with different hierarchical memory systems, modeled as 
a(r)-HRAMs (Hierarchical RAMs), where the time to access a location 
X is a{x). 

The width decomposition framework is proposed to provide a machine- 
independent characterization of temporal locality of a computation by 
a suitable set of space reuse parameters. Using this framework, it is 
shown that, when the schedule, i.e. the order by which operations are 
executed, is hxed, efficient portability is achievable. We propose (a) the 
decomposition-tree memory manager, which achieves time within a loga- 
rithmic factor of optimal on all HRAMs, and (b) the reoccurrence-width 
memory manager, which achieves time within a constant factor of opti- 
mal for the important class of uniform HRAMs. 

We also show that, when the schedule is considered as a degree of freedom 
of the implementation, there are computations whose optimal schedule 
does vary with the access function. In particular, we exhibit some compu- 
tations for which any schedule is bound to be a polynomial factor slower 
than optimal on at least one of two sufficiently different machines. On 
the positive side, we show that relatively few schedules are sufficient to 
provide a near optimal solution on a wide class of HRAMs. 



1 Introduction 



In recent years, the importance of the memory hierarchy has grown considerably, 
and is projected to continue growing in the future, as a result of technological de- 
velopments jMV99| as well as some fundamental physical constraints [RP97-99j . 
A number of studies, e.g., |AAG^8VIAG^8/IAG(j90lAGb'^94IV98lb'LPR99| , have 
investigated models of computation that explicitly capture at least some of the 
hierarchical aspects of modern memory systems, proposing novel algorithms and 
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compiler code restructuring techniques to achieve optimal performance on 

those models. 

Designing efficient algorithms for the memory hierarchy is made difficult by 
the fact that performance is affected in various ways by the structure of the 
hierarchy, (e.g., by the number, size, and speed of the various levels): in fact, 
a priori, the optimal implementation of an algorithm might depend upon it 
in a way that makes it impossible to achieve optimal performance on different 
machines with the same code. In the outlined scenario, we formulate the following 
question: To what extent ean a program be made efficiently portable across a class 
of machines with different memory hierarchies ? 

In pPQQj . we outline a general approach toward an analytical formulation 
of the above question, as a prerequisite for a quantitative study of the issue. 
Intimately related to the investigation of portability is the machine-independent 
characterization of those properties of a computation, such as various forms of 
locality and parallelism, that determine its execution time on a memory hierar- 
chy. Of paramount importance among these properties is temporal locality which, 
informally speaking, allows a computation to be carried out by accessing only 
a small set of memory locations at a time. In this paper, we substantiate the 
technical feasibility of the approach proposed in iF?pnni by an investigation on 
the portability of sequential programs across hierarchies where temporal locality 
is essentially the key factor of performance. We obtain quantitative characteriza- 
tions of temporal locality and of its portability across memory hierarchies. The 
focus on sequential temporal locality is justified by its relevance and by the need 
to gain insights on the general issues in a relatively simple setting. It remains 
desirable to extend the analysis to include space locality and parallelism. 

In Section 0 we define the H-RAMs, the class of target machines considered 
in our study. They are essentially uniprocessors with a random access memory 
where an access to memory location x takes time a{x), with a{x) being a generic 
non-negative, non-decreasing function of x. H-RAMs differ significantly from 
“real” machines in a number of ways (e.g., they lack block transfer) but we feel 
they are an excellent model for capturing and isolating the issues of temporal 
locality. We then define the notion of computation dag, by which we model 
a computation as a set of operations and their data dependencies. Informally, 
a computation admits many implementations which differ along two important 
dimensions: (i) the order of execution of the operations, and (ii) the way data are 
mapped to memory locations during execution. In the next sections we examine 
the impact of each dimension on the portability of the implementation. 

In Section 0 we assume that the operation schedule has been fixed, and 
we consider how to best manage memory for that schedule. First, we introduce 
the key notion of W -width decomposition of a schedule r and the corresponding 
parameter rT(W), called the space reuse, which informally corresponds to the 
number of subcomputations, each using approximately W space, into which r 
can be partitioned. The parameters r,-(2^), (where £ = 0,. , [log S~\ and S is the 
minimum space required to execute r) are sufficient to characterize the optimal 
execution time of r on a wide class of HRAMs, wielding the first quantitative 
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characterization of the qualitative notion of temporal locality. We provide lower 
bounds matched, within a logarithmic factor, by the decomposition-tree memory 
management strategy, and within a constant factor on a wide class of machines, 
by the reoccurrence-width strategy. Neither of these strategies takes into account 
the access function a{x), indicating that, for a fixed operation schedule, efficient 
portability is indeed achievable. 

In Section 0 we turn our attention to the impact of the operation schedule. 
Several interesting cases of computations (such as some algorithms for FFT, 
matrix multiplication, and sorting) are known to admit optimal schedules for 
classes of uniform HRAMs IIAA(;Ss7IFbFITT7^ . However, our main findings are 
on the negative side: by developing ideas formulated in |P98| . we show that, 
at least for some computations, the optimal schedule is different on different 
machines, both in the case of recomputation (multiple executions of the same 
operations are allowed) and of no recomputation. In each case we also provide 
lower bounds to the loss of time performance that any schedule suffers on at 
least one machine of a reasonable class. These results require a novel approach to 
analyze tradeoffs between the number of accesses in different regions, tradeoffs 
not captured by previous lower bound techniques for the memory hierarchy 
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One consequence of our results is that, to obtain generally portable code, 
such code must be somehow parametrized with parameters adaptable to those 
of the memory hierarchy, either statically, by a compiler, or dynamically, like in 
the systems FFTW [fT^ . ATLAS fwT^ . and PHiPAC |HAOnfl7] . While we 
take some preliminary steps to estimate the size of the parameter space of the 
code as a function of the acceptable loss of performance, this remains largely an 
uncharted and promising territory for further investigations. 



2 Models of Machines and Compntations 

We shall model machines as Hierarchical Random Access Machines (HRAMs) , a 
model very close to the HMM of jAACS87j and to the H-RAM of fHP97-99j . An 
HRAM consists of a serial processor, a program memory, and a data memory. 
Both memories are random access and consist of locations with addresses ranging 
over the nonnegative integers. Program memory is assumed to be very fast, with 
access time subsumed within the processor cycle. Data memory is hierarchical, 
with an access to address x taking a{x) units of time. The access function a(-) 
satisfies 0 < a{x) < a{x + 1), for any a: > 0. To stress the role of the access 
function, we shall use the notation a(a;)-HRAM for the machine. It is also quite 
useful to introduce the cumulative access function A{x) = YTy=Q for a; > 1, 
with A(0) = 0. 

As it emerged since some of the early investigations |^T 85 iAAc)^ 87 | , com- 
parison between upper and lower bounds on HRAM execution time often leads 
to consideration of the ratio a{y)/a{x) of access times for a suitably bounded 
value of the ratio yjx oi the corresponding addresses, motivating the following 
definition. 
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Definition 1. For given real parameters ct > 1, an access function a{x) or 
the corresponding a{x)-HRAM is said to be (^, a) -uniform if a{^x) < aa{x), for 
any x. 

We now turn our attention to computations, which we model as computation 
dags ( -I- and E denote the union operation restricted to disjoint sets). 

Definition 2. A computation directed acyclic graph (CDAG) is a 4-tuple C = 
(/, V, E, O) of finite sets such that: (1) I f)V = ^; (2) E C [I +V) xV is the set 
of arcs; (3) G = {I -\-V,E) is a directed acyclic graph with no isolated vertices; 
(4) I is called the input set; (5) V is called the operation set and all its vertices 
have one or two incoming arcs; (6) O C I + V is called the output set. 

Informally, with each vertex in / -|- we associate a value. For a vertex in J, 
the value is externally supplied and hence considered an input to the computa- 
tion. For a vertex in V, the value is the result of an operation whose operands 
are provided by the predecessors of that vertex. We have restricted the number 
of operands to be one or two. The set O defines which values, among all the 
ones being input or computed, form the desired output set. The main advantage 
of the CDAG model for the present investigation is that it specifies neither the 
order in which the operations have to be executed nor the memory locations 
where data have to be stored, which we consider degrees of freedom for the im- 
plementation. Furthermore, hierarchy related space complexity issues have been 
extensively investigated using the CDAG model (EHTI). 

3 Memory Management for a Fixed Operation Schedule 

In this section, we consider a computation modeled by a CDAG C = (I, V, E, O), 
together with a given feasible schedule r of its operations. For simplicity, we de- 
velop the analysis for a schedule without recomputation, modeled as a topological 
sorting r = (ui, . . . , vn) of the elements of V, that is, whenever (ui, Vj) € E, then 
i < j. All the results of the present section are readily generalized to schedules 
with recomputation. 

Given t, a program computing C must still choose in which memory loca- 
tions, over the course of the computation, the values involved will be stored, the 
objective being a minimization of the running time over HRAMs with as wide a 
range of access functions a(x) as possible. 

We begin by studying some lower limits that any memory map has to satisfy 
for a given schedule. 



3.1 Lower Bounds 

The intuition that, at any given time, at least one memory location must be used 
for each value already available and yet needed leads to the following definition 
and to the subsequent propositions. 
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Definition 3. For 0 < i < N , the width at i of a schedule r = (di, . . . , vn) of 
a CDAG C = (I,V,E,0) is the quantity Wr{i) = |^r(*)|, where 

Zr(i) = {u e I U {vi , . . . : (u € O) V (3j > i : {u,Vj) G E)}. 

The width of a schedule t = (iii, . . . ,vn) is the quantity Wr = max^ WV(*)- 

Next, we begin to develop the relationship between width, space, and time. 

Proposition 1. The amount St of space needed hy any HRAM execution of 
CDAG C according to schedule r satisfies Sr > WV- 

Proposition 2. The time Tt needed by an a{x)-HRAM execution of CDAG C 
according to schedule r satisfies Tt > X]/t=o ^ ~ 

The analysis embodied by Propositions Hand El can be refined by partitioning a 
computation of C into contiguous subcomputations and considering their sepa- 
rate accesses. Next, we review the notion of topological partition |ljPfi7-fi!^ and 
we formalize the notion of subcomputation. 



Definition 4. The sequence (Vi, V 2 , ■ • ■ , Vq) of nonempty subsets of V is called 
a topological partition of CDAG C = (I,V,E,0) if V = and E C 

i^h=l ^k=h+l (-^ + X 14)- 

Definitions. Let {V\,V 2 , ■ ■ ■ ,Vq) be a topological partition of CDAG 
C — {I,V, E,0). Then, for h = 1,2, ...,q, the subcomputation of C induced 
by Vh is the CDAG Ch = {Ih,Vh, Eh,Oh), where: 



Proposition 3. With the notation of Definitions^ let Th be a schedule for sub- 
computation Ch, for h = 1,2, ... ,q, and let the concatenation t =< t\, . . . ,Tq > 
be the corresponding schedule for C . Then, for any S > 0, the number Q{S) of 
accesses to locations with address x > S made by any HRAM execution of C 
according to schedule r satisfies Q{S) > ~ S). 

Proposition 4. With the notation of Proposition 0 the running time of any 
execution of C according to schedule r on an a{x)-HRAM satisfies 



Of particular interest for our developments are topological partitions where 
all the subcomputations, except possibly for the last one, have nearly the same 
width. These partitions, to be defined next, will enable us to derive tight lower 
and upper bounds on computation time. 



h = {u ^ Vh : G Vh s.t. {u, v) G E}, 

Oh = {u G Ih -\-Vh : u G OV {3v ^Vh s.t. {u, v) G E)}, 
Eh = En((Ih + Vh) xVh). 




( 1 ) 
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Definition 6. Given a CDAG C = (I,V,E,0), we say that < ti, . . . ,Tq > is 
the LL-width decomposition of a schedule r for C if (i) t =< . . . ,Tq > , (ii) 

< W for h = 1,2,. . . ,q, and (Hi) W^+ > W for h = 1,2, . . . ,q — 1, where 

t(( is obtained by appending to the next operation in r. 

All subcomputations in a VL-width decomposition, except possibly for the 
last one, are full in the sense that their width is at least W — 2. The number 
rriW) £ {q — l,g} of full subcomputations, hereafter referred to as the space 
reuse (at W), gives the following valuable information on the access complexity: 

Qr{W/2-2)>{W/2)rr{W). (2) 

This bound follows easily from Proposition 0by setting S = Wf2 — 2, q = rr(W), 
and Wr,,—S = W—{W)2 — 2) > W/2. Let H be the smallest integer greater than 
two such that WV < 2^ . For 2 < i < H , let = r,-(2^) be the number of full 
subcomputations in the 2^- width decomposition of r. Also, let 2ri2 = \E\ + \ V\. 
It turns out that Ui’s closely characterize the temporal locality properties of a 
schedule, as the next result shows from the lower bound perspective. 

Theorem 1. Let t be a schedule for GDAG C and let ni be defined as above, 
for £ = 2,3, . . . , H . Then, the time of any execution of C according to schedule 
T on an a{x)-HRAM satisfies the bound 



H-l 

Tr>J2 - 2m+i)2^-^a{2^-^ - 2 ). 

£^2 



( 3 ) 



3.2 Efficient Strategies for Memory Management 

We now turn our attention to constructive ways of defining a memory map for 
a given schedule r = {vi,V 2 , ■ ■ ■ , 'Cat) of computation C. It can be easily shown 
that the lower bound given by Proposition D is tight, i.e., St = Wt- However, 
when the objective is the minimization of execution time on an H-RAM, it is 
crucial not only to reduce the overall space, but also to bias the utilization 
of space toward the smaller, faster locations. We have developed two memory 
managers. The first, named the decomposition-tree memory manager, uses a 
variant of the topological-separator proposed in jRP97-99] that yields a tighter 
control of space, essential for our present purposes. 

Decomposition- Tree Memory Management 

1. Partition r into two subschedules tq and ti. 

2. Reorganize the inputs of r so that the inputs of Tq lie in the address range 
[0, ..., Wro — 1], and the remaining inputs of r lie in the range ..., ITV — 1]. 

3. Recursively execute Tq within the address range [0, ..., (WVo ~ !)]• 

4. Reorganize data in memory so that the inputs of t\ lie in the address range 
[0, ..., WVi — 1], and the remaining data in the range [Wr ^, ..., Wr — 1]. 

5. Recursively execute t\ within the address range [0, ..., {Wr^ — 1)]. 
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The formal description of the algorithm and its analysis we leave for the final 
version of the paper. They lead to the following result: 

Theorem 2. The balanced, binary decomposition-tree strategy for memory 
management yields an a{x)-HRAM program P which correctly executes CD AG 
C = {V,E,I,0) with schedule t in optimal space S^- = WV and time 

T, < (dflogiV] +2 )T;p‘, (4) 

where denotes the minimum time achievable on an a{x)-HRAM. 

Theorem El makes no restrictive assumption on the access function a{x) mod- 
eling HRAM delays. Independence of the access function poses rather stringent 
constraints on the memory manager, as it implies that no address x > WV = Sr 
can be used (otherwise, there would be no way to relate the corresponding access 
to TfP). However, stronger results can be obtained if we restrict our attention 
to uniform HRAMs, which are both physically and technologically sound. In this 
direction, we develop an alternate memory management strategy which achieves 
optimality to within a constant factor on uniform HRAMs. Unlike the tree- 
decomposition approach, the strategy presented below relocates data in memory 
only in correspondence with operations that involve those data, at an address 
(approximately) proportional to the amount of space needed by the subcompu- 
tation intervening between the current operation and the next operation where 
that value occurs as an operand. 

Let H be the integer such that < Wr < 2^. We define memory region 

M 2 = {0, 1} and, for £ = 3, . . . , iL, memory region Mi = — 6, ... , 2^+^ — 7}, 

of size 2^. 

We shall say “store in region Mf as an abbreviation for “store in the 
location of smallest address among those currently available in MiT We 
shall also use the shorthand Wr{u\@Vk) to indicate the width of the subcom- 
putation between the two consecutive occurrences Vi and Vj of u, with i < k < j. 

Reoccurrence-Width Memory Management 

1. Input. We assume that the inputs uq,ui, . . . ,it|/|_i are originally available 
at the |/| lowest memory locations. For each h = |/| — 1, . . . , 1, 0, move input 
Uh to the smallest region of size at least rcT-(M|@wo)) excluding M 2 . 

2. Operations. For k = 1,2, . . . , N, do: 

(i) Load: Move the operand(s) of Vk to location 0 (and 1), in M 2 . 

(ii) Compute: Execute operation Vk storing the result in Mi, with 
I = max(3, |'logWr(?^fc|@Wfe)l)- 

(iii) Relocate: Move each operand u of Vk subsequently needed in the com- 
putation to Ml, with £ = max(3, [log )• 

3. Output. The outputs zq,Zi, . . . , z^oi-i are now stored in non decreasing order 
of the quantity £ = max(3, [logWr(-2h|@^^Af-i-i)l)- For each h = 0, 1, . . . , |0| — 
1, move output Zh to location h. 

A somewhat complex analysis establishes the correctness of the above strat- 
egy and provides an upper bound to the resulting HRAM running time, in terms 
of the access function a{x) (and the related A{x) = J2y=o ®(^)) characterizing 
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the machine and in terms of the parameters n^’s characterizing the computation. 
The resulting upper bound is also compared to the lower bound of Theorem E 
thus establishing the optimality, to within a constant factor, of the reoccurrence- 
width strategy, for a wide class of access functions. 

Theorem 3. The reoccurrence-width strategy yields an a{x)-HRAM program P 
which executes CDAG C = (V,E,I,0) with schedule r in time 



H-l 

Tr<J2 - 2n£+i)2^+ia(2^+" - 7) + A(|/|) + A(|0|). (5) 

e=2 

Ij j'opt figpiQigg ihg rninimum time achievable on an a{x)-HRAM, we have: 

T^<{4j{Wr) + l)T°P\ ( 6 ) 

where 7(1^.^) = max^g{ 3 ^..._^_i}(a( 2^+2 - 7)/a(2^“^ - 2)) . 

For most functions a(x) of interest, the quantity a(8x -h 9)/a(x), and hence the 
quantity 7 (Wr) in Relation El is bounded above by a constant. E.g., if a(x) 
is (25/2, Qf)-uniform, then 7(WV) < a. In the interesting case where a(x) = 
aoVx -h 1 modeling speed-of-light delays in planar layouts, we obtain < 

< 3. 



4 Optimal Schedule and Memory Access Function 



Some CDAGs, such as those corresponding to matrix multiplication and of the 
radix-2 FFT, do admit a portable schedule that is simultaneously (near) optimal 
for all machines (in a wide class); it is natural to ask whether any CDAG does 
[IPDSfKI . Below, we answer this question negatively by exhibiting CDAGs 

for which it can be shown that there are HRAM pairs such that, for any schedule, 
at least on one of the two machines, time performance is significantly suboptimal, 
i.e. by a factor polynomial (with exponent between 0 and 1) in the number N of 
operations. It is quite possible that a given CDAG admits a portable schedule if 
recomputation is allowed but it does not when recomputation is disallowed, and 
vice versa. For this reason, we deal with the two cases separately. 



Theorem 4. Let Mq denote a 1-HRAM (the standard RAM) and denote 
a y/x-HRAM. For infinitely many N , there exists a CDAG C{N) of N oper- 
ations such that the running time of any schedule, allowing recomputation, is 
suboptimal by a factor on Mq or on Mij 2 - 

Space limitations force us once more to leave the proof to the full version of the 
paper, but we will attempt to convey an idea of the line of argument. The CDAG 
C{N) is actually the member of a family whose generic element Rf. has 

one input and N = rc — 1 operation nodes. The nodes of R) are connected as 
a linear chain which can be visualized as folded into r rows of c columns each, 
with additional arcs connecting nodes that are consecutive in a column. Figure ^ 
illustrates Rq. Intuitively, on machine Mq, the best schedule is the one that per- 
forms no recomputation. On machine Mij 2 , instead, a better schedule computes 



136 



G. Bilardi and E. Peserico 




Fig. 1. The CDAG Rt 



node Vj by first recomputing the entire column above it from the column above 
the previous node (and from Vq if j = 0), using only 0(r) memory locations for 
the whole computation. Our analysis shows that the class of all schedules can be 
partitioned into two subclasses depending on whether (according to a suitable 
technical measure) the amount of recomputation being performed is above or 
below a certain threshold. It is then shown that a schedule above (respectively, 
below) the threshold is bound to incur a significant time loss on Mg (respectively, 
^ 1 / 2 )- We observe that can be viewed as the CDAG describing the process- 
ing performed during N — 1 steps by a simple type of digital filter of order c. It 
appears quite likely that similar behaviours will be exhibited by other natural 
CDAGs. A similar result can be obtained when ruling out recomputation: 

Theorem 5. Let Ma denote an x°‘-HRAM. Given any pair a and l3 of real 
numbers with 0 < a < /3 < 1, for infinitely many N, there exists a CDAG 
of N operations such that the running time of any schedule Ta^p{N) 

with no recomputation is suboptimal by a factor [2{N ~ ) on or on Mp. 

The CDAG Ca,fl{N) referred to in the statement of Theorem 0 belongs to a 
family of CDAGs Gtf^ „ an element of which is illustrated in Figure |3 Informally, 
„ consists of p almost disjoint isomorphic subgraphs, only sharing a set of n 
inputs ig, ...,in-i- Subgraph h has a backbone consisting of a long chain: 






with as an input and as the designated output. In addition, takes 

ik%n as an input, and takes as an input. For convenience, we shall refer 
to the three portions of the operation chain as to the the u-chain, the v-chain, 
and the w-chain, respectively. 

A key property is that, between the computation of the first and of the last 
vertex of each v-chain, all the values of the corresponding u-chain must reside 
in memory, since they will be needed for the computation of the w-chain. Then, 
the following tradeoffs arises: as the number of v-chains simultaneously under 
execution increases, the space and time to store and retrieve the corresponding 
u-chains increases, while the time to access the i inputs can be made to decrease, 
since - once accessed - such an input can be used to advance the computation of 
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several v-chains. Ultimately, it turns out that the optimal point in this tradeoff 
changes with the access function of the HRAM, with the optimal number of 
v-chains under simultaneous execution decreasing for machines with “steeper” 
access functions. The systematic, quantitative analysis of the above tradeoff is 
rather subtle and left for the full version of the paper. 

Theorems El and El do not rule out the possibility of a parametrized program 
which, when the parameters are selected (perhaps by a knowledgeable compiler 
or run-time system) as a function of a{x) achieves optimal performance. The next 
results explore the question of how many different schedules have to be generated, 
as the parameters span over their range, in order to achieve a performance within 
a given factor from optimal on any machine from a suitable class. 

Definition 7. A set of schedules T = {ti, . . . , Tr} of a given a CD AG C is said 
to be s-optimal with respect to a class T-L of HRAMs if, for any M £ T~L, there 
is a T G T such that C can be executed on M with schedule r in time within a 
factor s of optimal. 

Theorem 6. Let s) be the class of the (^, y/s)-uniform HRAMs. LetC{^, s) 
be the set of CDAGs C such that no optimal schedule of C on any HRAM in 
s) requires more than N space. Then we have: 

— (Upper Bound.) For any C G C(f, s), there is a set T of schedules s-optimal 

with respect to 'H{f,s) with size |T| < fa 

— (Lower Bound.) There is a constant K such that, for any s > 1, there 
is an infinite sequence of CDAGs C{N) in C{f,s) of N operations such 
that any set of schedule T s-optimal with respect to 'H{^,s) has size |T| > 

^if/(logs-HoglogAT)^ 
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From Theorem El we see that, although a CDAG might well admit an ex- 
ponential number of schedules, a suitable polynomial subset of them always 
contains one with performance within a constant factor of optimal. We remark 
that the proof of the preceding upper bound result (left to the full version of the 
paper) exploits approximability properties of the access function a (a:), making 
no use of any special structure of the relevant CDAGs. We also show that, at 
least for some GDAGs, this number of schedules can not be substantially re- 
duced. The detailed description and analysis of such GDAGs is again left to the 
full version, but the key idea is to consider GDAGs composed by a family of 
0(logiV/(logs -I- log log fV)) sets of graphs of the type introduced in Theorems 
El and El By careful tuning of the parameters, we can make the asymptotic time 
requirements of the execution of different sets on an a(a;)-HRAM depend only 
on the behaviour of a(x) on different, disjoint intervals of addresses, forcing in 
turn, if s-optimality is to be achieved, a different schedule of the global GDAG 
according to whether a(x) is sufficiently “steep” or not on each interval. The 
dependence of the size of s-optimal sets of schedules upon the structure of the 
GDAG is a very interesting problem and certainly deserves investigation. 

5 Conclusions 

We have proposed the width framework leading to a quantitative definition of 
temporal locality which enables performance estimates for an algorithm on dif- 
ferent hierarchical systems. Then, we have explored the efficient portability of 
a fixed implementation of an algorithm on the spectrum of different systems. 
We have found that the exploitation of the inherent temporal locality of an 
algorithm through the memory management is quite amenable to a machine- 
independent optimization. Instead, the optimization of the operation schedule 
generally requires some knowledge of the target memory system. 

This work can be extended in several directions. More general memory mod- 
els need to be considered, to include block transfer, pipelined accesses, and par- 
allel memories. Indeed, the width framework has already proven useful in the 
investigation of sequential, pipelined hierarchies \nEm- More flexible models 
of portability are also of interest, where the code is allowed to somehow adapt to 
the machine. The previous section touches on such issues, but a systematic in- 
vestigation of “parametrized” algorithms remains desirable. A final observation, 
for which we are indebted to Peter M. Kogge, is that several of our results could 
be reinterpreted by viewing a(x) as the energy required to retrieve the content 
of location x, a metric of interest in the context of low power computing. 
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Abstract. We present tight upper and lower bounds for the problem 
of constructing evolutionary trees in the experiment model. We describe 
an algorithm which constructs an evolutionary tree of n species in time 
O(ndlog^n) using at most n{d/2] (log 2 |-,i/ 2 i-i n + 0(l)) experiments for 
d > 2, and at most n(logn + 0(l)) experiments for d = 2, where d is the 
degree of the tree. This improves the previous best upper bound by a fac- 
tor 0(logd). For d = 2 the previously best algorithm with running time 
0(n log n) had a bound of 4nlogn on the number of experiments. By 
an explicit adversary argument, we show an l?(ndlog^n) lower bound, 
matching our upper bounds and improving the previous best lower bound 
by a factor ©(log^^n). Central to our algorithm is the construction and 
maintenance of separator trees of small height, which may be of inde- 
pendent interest. 



1 Introduction 

The evolutionary relationship for a set of species is commonly described by an 
evolutionary tree, where the leaves correspond to the species, the root corre- 
sponds to the most recent common ancestor for the species, and the internal 
nodes correspond to the points in time where the evolution has diverged in dif- 
ferent directions. The evolutionary history for a set of species is rarely known, 
hence estimating the true evolutionary tree for a set of species from obtainable 
information about the species is of great interest. Estimating the true evolu- 
tionary tree computationally requires a model describing how to use available 
information about species to estimate aspects of the true evolutionary tree. Given 
a model, the problem of estimating the true evolutionary tree is often referred 
to as constructing the evolutionary tree in that model. 
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Fig. 1. The four possible outcomes of an experiment for three species a, b and c 



In this paper we study the problem of constructing evolutionary trees in 
the experiment model proposed by Kannan, Lawler and Warnow in |h]. In this 
model the information about the species is obtained by experiments which can 
yield the evolutionary tree for any triplet of species, cf. Fig. E The problem of 
constructing an evolutionary tree for a set of n species in the experiment model 
is to construct a rooted tree with no unary internal nodes and n leaves labeled 
with the species such that the topology of the constructed tree is consistent 
with all possible experiments involving the species. Hence, the topology of the 
constructed tree should be such that the induced tree for any three species is 
equal to the tree returned by an experiment on those three species. 

The relevance of the experiment model depends on the possibility of per- 
forming experiments. A standard way to express phylogenetic information is by 
a distance matrix. A distance matrix for a set of species is a matrix where en- 
try Mij represents the evolutionary distance between species i and j, measured 
by some biological method (see |0| for further details). For three species a, b 
and c where Mab < rmn{Mac, Mbc} it is natural to conclude that the least com- 
mon ancestor of a and b is below the least common ancestor of a and c, i.e. the 
outcome of an experiment on o, 6 and c can be decided by inspecting Mab, Mac 
and Mbc- The consistency of experiments performed by inspecting a distance 
matrix depends entirely on the distance matrix. Kannan et al. in jHI define a 
distance matrix as noisy-ultrametric if there exists a rooted evolutionary tree 
such that for all triplets of species a, b and c it holds that Mab < min{Moc, Mbc} 
if and only if the least common ancestor of a and b is below the least common 
ancestor of a and c in the rooted evolutionary tree. Hence, if a noisy-ultrametric 
distance matrix for the set of species can be obtained, it can be used to per- 
form experiments consistently. Another and more direct method for performing 
experiments is DNA-DNA hybridization as described by Sibley and Ahlquist 
in 0. In this experimental technique one measures the temperature at which 
single stranded DNA from two different species bind together. The binding tem- 
perature is correlated to the evolutionary distance, i.e. by measuring the binding 
temperatures between DNA strands from three species one can decide the out- 
come of the experiment by deciding which pair of the three species bind together 
at the highest temperature. 

Kannan et al. introduce and study the experiment model in jH] under the as- 
sumption that experiments are flawless in the sense that they do not contradict 
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each other, i.e. it is always possible to construct an evolutionary tree for a set 
of species that is consistent with all possible experiments involving the species. 
They present algorithms for constructing evolutionary trees with bounded as 
well as unbounded degree, where the degree of a tree is the maximum number 
of children for an internal node. For constructing binary evolutionary trees they 
present three different algorithms with running times O(nlogn), O(nlog^n) 
and 0{n^) respectively, using 4nlogn, nlog 3 / 2 ''T' and nlogn experiments re- 
spectively, where log n denotes log 2 n. For constructing an evolutionary tree of 
degree d they present an algorithm with running time O(n^) using 0{dnlogn) 
experiments. Finally, for the general case they present an algorithm with run- 
ning time O(n^) using O(n^) experiments together with a matching lower bound. 
Kao, Lingas, and Ostlin in 0 present a randomized algorithm for constructing 
evolutionary trees of degree d with expected running time 0(nd log n log log n). 
They also prove a lower bound f2(nlogn -I- nd) on the number of experiments. 
The best algorithm so far for constructing evolutionary trees of degree d is due 
to Lingas, Olsson, and Ostlin, who in jSj present an algorithm with running time 
0(nd log n) using the same number of experiments. 

In this paper we present the first tight upper and lower bounds for the prob- 
lem of constructing evolutionary trees of degree d in the experiment model. 
We present an algorithm which constructs an evolutionary tree for n species 
in time 0{ndlog^n) using at most n\d/ 2 \{\og 2 \^d/ 2 \-i'^ ~^ ^{^)) experiments 
for d > 2, and at most n(logn -|- 0(1)) experiments for d = 2, where d is the 
degree of the constructed tree. The algorithm is a further development of an 
algorithm from Our construction improves the previous best upper bound 
by a factor 0(log d). For d = 2 the previously best algorithm with running time 
0(n log n) had a bound of 4nlogn on the number of experiments. The improved 
constant factors on the number of experiments are important because experi- 
ments are likely to be expensive in practice, cf. Kannan et al. jO|. By an explicit 
adversary argument, we show an J7(ndlog,^n) lower bound, matching our upper 
bounds and improving the previous best lower bound by a factor ©(logj^n). 

Our algorithm also supports the insertion of new species with a running 
time of 0{mdlogd{n + m)) using at most m|’d/2] (log 2 |-d/ 2 ]-i(’^ -I- m) -|- 0(1)) 
experiments for d> 2, and at most m(log{n + m) + 0{l)) experiments for d = 2, 
where n is the number of species in the tree to begin with, m is the number 
of insertions, and d is the maximum degree of the tree during the sequence 
of insertions. Central to our algorithm is the construction and maintenance of 
separator trees of small height. These algorithms may be of independent interest. 
However, due to lack of space we have omitted the details on separator trees. 
For further details we refer the reader to the full version of the paper jS]. 

The rest of this paper is organized as follows. In Sect. 0 we define separator 
trees and state results on the construction and efficiently maintenance of sepa- 
rator trees of small height. In Sect. 0we present our algorithm for constructing 
and maintaining evolutionary trees. In Sect. 0 and 0 the lower bound is proved 
using an explicit adversary argument. The adversary strategy used is an exten- 
sion of an adversary used by Borodin, Guibas, Lynch, and Yao 0 for proving 
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a trade-off between the preprocessing time of a set of elements and membership 
queries, and Brodal, Chaudhuri, and Radhakrishnan 0 for proving a trade-off 
between the update time of a set of elements and the time for reporting the 
minimum of the set. 



2 Separator Trees 

In this section we define separator trees and state results about efficient algo- 
rithms for their constructing and maintenance. For further details see [S|. 

Definition 1. Let T be an unrooted tree with n nodes. A separator tree St 
for T is a rooted tree on the same set of nodes, defined recursively as follows: 
The root of St is a node u in T , called the separator node. The removal of u 
from T disconnects T into disjoint trees T\, . . . ,Tk, where k is the number of 
edges incident to u in T. The children of u in St are the roots of separator trees 
for Ti , . . . ,Tfe. 

Clearly, there are many possible separator trees St for a given tree T. An 
example is shown in Fig. 




Fig. 2. A tree T (left) and a separator tree St for T (right) 



For later use, we note the following facts for separator trees: 

Fact 1 Let St be a separator tree for T, and let v be a node in T. If Sy denotes 
the subtree of St rooted at v, then: 

1. The subgraph Ty induced by the nodes in Sy is a tree, and Sy is a separator 
tree for Ty . 

2. For any edge from T with exactly one endpoint in Ty, the other endpoint is 
an ancestor of v in St, and each ancestor of v can be the endpoint of at 
most one such edge. 

The main point of a separator tree St is that it may be balanced, even 
when the underlying tree T is not balanced for any choice of root. The notion 
of balanced separator trees is contained in the following definition, where the 
size |T| of a tree T denotes the number of nodes in T, and where Ti refers to the 
trees T\, . . . ,Tk from Definition Q 



144 



G.S. Brodal et al. 



Definition 2. A separator tree is a t-separator tree, for a threshold t € [1/2, 1], 
*/ \Ti\ ^ t\T\ for eaeh Ti and the separator tree for each Ti is also a t-separator 
tree. 

In pj we first give a simple algorithm for constructing 1/2-separator trees 
in time 0(n log n). We then improve the running time of the algorithm to 0{n) 
by adopting additional data structures. We note that a 1/2-separator tree has 
height at most [log n\ . 

We also consider dynamic separator trees under the insertion of new nodes 
into a tree T and its corresponding separator tree St, and show how to maintain 
separators trees with small height in logarithmic time per insertion. Our methods 
for maintaining balance and height in separator trees during insertions of new 
nodes are based on rebuilding of subtrees, and are inspired by methods of An- 
dersson and Lai described in 0121 for maintaining small height in binary search 
trees. We first show how the linear time construction algorithm for 1/2-separator 
trees leads to a simple algorithm for keeping separator trees well balanced. The 
height bound achieved by this algorithm is O(logn), using O(logn) amortized 
time per update. We then use a two-layered structure to improve the height 
bound to logn -|- 0(1) without sacrificing the time bound. The improved con- 
stant factor in the height bound is significant for our use of separator trees for 
maintaining evolutionary trees in the experiment model, since the number of 
experiments for an insertion of a new species will turn out to be proportional 
to the height of the separator tree. Furthermore, this height bound is within an 
additive constant of the best bound possible, as trees exist where any separator 
tree must have height at least [log nj , e.g. a tree which is a single path. 

Finally, we extend the separator trees with a specific ordering of the children, 
facilitating our use of separator trees in Sect. 0for finding insertion points for 
new species in evolutionary trees. The basic idea is to speed up the search in the 
separator tree by considering the children of the nodes in decreasing size-order. 
This ensures a larger reduction of subtree size in the case that many children 
have to be considered before the subtree to proceed the search in is found. Our 
main result about separator trees is summarized in the following theorem. 

Theorem 1. Let T be an unrooted tree initially containing n nodes. After 0(n) 
time preprocessing, an ordered separator tree for T can in time 0{mlog(ji + m)) 
be maintained during m insertions in a way such that the height is bounded by 
log(n -I- m) -I- 5 and such that for any path {v\,V 2 , . ■ . , vt) from the root v\ to a 
node Vi in the separator tree, the followings holds 

n di < 16d{n + m) , (1) 

di<2 di>2 



where di is the number which has in the ordering of the children of Vi, 
for I < i < i, and d is max{<ii, . . . ,c?^_i}. 
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3 Algorithm for Constructing and Maintaining 
Evolutionary Trees 

In this section we describe an algorithm for constructing an evolutionary tree T 
in the experiment model for a set of n species in time 0{nd\og^n), where d 
is the degree of the tree. Note that d is not known by the algorithm in ad- 
vance. The algorithm is a further development of an algorithm by Lingas et al. 
in 1^. Our algorithm also supports the insertion of new species with running 
time 0{mdlog^{n + m)) using at most m|"d/2] (log 2 |-(i/ 2 ]-i(^ + +C(1)) ex- 

periments for d > 2, and at most m(log(n -I- to) -I- 0(1)) experiments for d = 2, 
where n is the number of species in the tree to begin with, to is the number 
of insertions, and d is the maximum degree of the tree during the sequence of 
insertions. 

The construction algorithm inserts one species at the time into the tree in 
time O(dlog^n) until all n species have been inserted. The search for the inser- 
tion point of a new species a is guided by a separator tree St for the internal 
nodes of the evolutionary tree T for the species inserted so far. The search starts 
at the root of St- In a manner to be described below, we decide by experiments 
which subtree, rooted at a child of the root in St, the search should continue 
in. This is repeated recursively until the correct insertion point in T for a is 
found. We keep links between corresponding nodes in St and T for switching 
between the two trees. To facilitate the experiments, we for each internal node 
in T maintain a pointer to an arbitrary leaf in its subtree. When inserting a new 
internal node in T this pointer is set to point to the new leaf which caused the 
insertion of the node. 

We say that the insertion point of a is incident to a node u, if 

1. a should be inserted directly below v, or 

2. a should split an edge which is incident to v by creating a new internal node 
on the edge and make a a leaf below the new node, or 

3. if u is the root of T, a new root of T should be created with a and v as its 
two children. 

The invariant for the search is the following. Assume we have reached node v 
in the separator tree for the internal nodes in T, and let Sy be the internal nodes 
of T which are contained in the subtree of St rooted at v (including v). Then 
the insertion point of the new species a is incident to a node in Sy. 

Let V be the node in St for which we want to decide if the insertion point 
for the new species a is in the subtree above in T; if it is in a subtree rooted 
at a child of u in T; or if a should be inserted as a new child of v. We denote by 
u\, . . . ,Uk the children of v in T, where ui, . . . , Uk' are nodes in distinct subtrees 
Ti, . . . , Tk' below V in St, whereas Uk>+i, ■ ■ ■ ,Uk are leaves in T or are nodes 
above v in St- The order of the subtrees Ti,...,Tk' below v in St is given 
by the ordered separator tree St and determines the order of Ui , . . . , Uk' - The 
remaining children rtfe'+i, -. - ,Uk of v may appear in any order. 

We perform at most |"fc/2] experiments at v- The Tth experiment is on the 
species a, b and c, where b and c are leaves in T below U 2 i-i and U 2 i respectively. 
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The leaves b and c can be located using the pointers stored at U 2 i-i and U 2 i- 
Note that the least common ancestor of b and c in T is v. If k is odd then the 
species b and c in the |"A:/2]’th experiment is chosen as leaves in T below Uk 
and ui respectively, and note that the two leaves are distinct because fc > 2 by 
definition. There are four possible outcomes of the j’th experiment corresponding 
to Fig. n 

1. (a, 6, c) implies that the insertion point for a is incident to a descendent of Uj, 
where b and c are not descendents of uj, or a is a new leaf below v. 

2. ((o, 5), c) implies that the insertion point for a is incident to a descendent of 
U 2 i-i, since the least common ancestor of a and b is below v in T. 

3. ((a,c),b) is symmetric to the above case and the insertion point of a is 
incident to a descendent of U 2 i (ui for the |"fc/2]’th experiment if k odd). 

4. ((6, c), a) implies that the insertion point of a is in the subtree above v, since 
the least common ancestor of a and b is above v. If u is the present root of T, 
a new root should be created with children a and v. 

We perform experiments for increasing i until we get an outcome difference 
from Case 1, or until we have performed all \k/2\ experiments all with outcome 
cf. Case 1. In the latter case species a should be inserted directly below u in T as 
a new child. In the former case, when the outcome of an experiment is different 
from Case 1, we know in which subtree adjacent to u in T the insertion point 
for species a is located. If there is no corresponding subtree below v in St, then 
we have identified the edge incident to u in T which the insertion of species a 
should split. Otherwise we continue recursively searching for the insertion point 
for species a at the child of v in St which roots the separator tree for the 
subtree adjacent to v which has been identified to contain the insertion point 
for a. When the insertion point for species a is found, we insert one leaf and at 
most one internal node into T, and St is updated according to Theorem d 

Lemma 1. Given an evolutionary tree T for n species with degree d, and 
a separator tree St for T according to Theorem Q then a new species a 
can be inserted into T and St in amortized time O(dlogj^n) using at most 
[<i/2] (log 2 |-£;/ 2 ]-i + 0(1)) experiments for d > 2, and at most logn + 0(1) 

experiments for d= 2. 

Proof. Let vi, . . . ,vi be the nodes in St (and T) visited by the algorithm while 
inserting species o, where vi is the root of St and Vj+i is a child of Vj in St- 
Define di by being the dfili child of Vi in St, for 1 < i < £. 

For d = 2 we perform exactly one experiment at each Vi. The total number of 
experiments is thus bounded by the height of the separator tree. By Theorem [D 
it follows that the number of experiments is bounded by logn + 0(1). In the 
following we consider the case where d >3. 

For i < £, let Xi denote the number of experiments performed at node Vi. We 
have Xi < |"d/2] and di > 2xi — 1, since each experiment considers two children 
of Vi in T and the first experiment also identifies if a should be inserted into the 
subtree above Vi. At vg we perform at most \d/2~\ experiments. 
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For di, . . . , di-i we from Theorem^have the constraint Od <2 ^ ■ lid >2 — 

16dn, since I^tI < n — 1. To prove the stated bound on the worst case number of 
experiments we must maximize X)i=o under the above constraints. We have 



log(16dn) > Ei + E log di 

di<2 di>2 

> ^ 1 + ^ log d, 

Xi — 1 Xi'>l 

> ^ a:* + ^ Xi—log{2x^ - 1) 



> 



Xi — 1 Xi^l 

1 



Xi 



e-i 



[d/2] 



log( 2 [d/ 2 ] - 1 ) 



where the second inequality holds since > 1 implies di > 3. The last inequality 
holds since for f(x) = ^ log(2x — 1) we have 1 > /(2) > /(3) and f(x) is 
decreasing for cc > 3, i.e. f(x) is minimized when x is maximized. 

We conclude that X^i=i log 2 \d/ 2 ']-i{l^^dn), i.e. for the total num- 
ber of experiments we have (log 2 |-d/ 2 ]_i( 16 dn) -h 1 ). 

The time needed for the insertion is proportional to the number of experi- 
ments performed plus the time to update St- By Theorem^ the total time is 
thus 0 (dlog(; n). □ 

From Lemma Hand Theorem Q we get the following bounds for constructing 
and maintaining an evolutionary tree under the insertion of new species in the 
experiment model. 



Theorem 2. After 0(ri) preprocessing time an evolutionary tree T for n species 
can he maintained under m insertions in time 0{dm\og^{n + m)) using at most 
m[d/ 2 ] (log 2 |-(^/ 2 ]-i(’a+w)-l- 0 (l)) experiments for d > 2, and at most mi\og{n+ 
m) + 0 (f)) experiments for d = 2, where d is the maximum degree of the tree 
during the sequence of insertions. 



4 Adversary for Constructing Evolutionary Trees 

To prove a lower bound on the number of experiments required for construct- 
ing an evolutionary tree of n species with degree at most d, we describe an 
adversary strategy for deciding the outcome of experiments. The adversary is 
required to give consistent answers, i.e. the reported outcome of an experiment 
is not allowed to contradict the outcome of previously performed experiments. 
A construction algorithm is able to construct an unambiguous evolutionary tree 
based on the performed experiments when the adversary is not able to answer 
any additional experiments in such a way that it contradicts the constructed evo- 
lutionary tree. The role of the adversary is to force any construction algorithm 
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to perform provably many experiments in order to construct an unambiguous 
evolutionary tree. 

To implement the adversary strategy for deciding the outcome of experiments 
in a consistent way, the adversary maintains a rooted infinite d-ary tree, D, where 
each of the n species are stored at one of the nodes, allowing nodes to store sev- 
eral species. Initially all n species are stored at the root. For each experiment 
performed, the adversary can move the species downwards by performing a se- 
quence of moves, where each move shifts a species from the node it is currently 
stored at to a child of the node. 

By deciding the outcome of experiments, the adversary reveals information 
about the evolutionary relationships between the species to the construction al- 
gorithm performing the experiments. The distribution of the n species on D 
represents the information revealed by the adversary (together with the for- 
bidden and conflicting lists introduced below). The evolutionary tree T to be 
established by the construction algorithm will be a connected subset of nodes 
of D including the root. Initially, when all species are stored at the root, the 
construction algorithm has no information about the evolutionary relationships. 
The evolutionary relationships revealed to the construction algorithm by the 
current distribution of the species on D corresponds to the tree formed by the 
paths from the root of D to the nodes storing at least one species. More pre- 
cisely, the correspondence between the final evolutionary tree T and the current 
distribution of the species on D is that if u is a leaf of T labeled a then species a 
is stored at some node on the path in D from the root to the node v. 

Our objective is to prove that if an algorithm computes T, then the n species 
on average must have been moved f2 {log levels down by the adversary, and 
that the number of moves by the adversary is a fraction 0{l/d) of the number 
of experiments performed. These two facts imply the J7(ndlog(^ n) lower bound 
on the number of experiments required. 

To control its strategy for moving species on D, the adversary maintains 
for each species a a forbidden list F(a) of nodes and a conflicting list C(a) of 
species. If a is stored at node v, then F(a) is a subset of the children ci, . . . , 
of V, and G(a) is a subset of the other species stored at v. If Ci G F(a), then a is 
not allowed to be moved to child Ci, and if 6 G C(a) then a and b must be moved 
to two distinct children of v. It will be an invariant that b G C(a) if and only if 
a G C{b). Initially all forbidden and conflicting lists are empty. The adversary 
maintains the forbidden and conflicting lists such that the size of the forbidden 
and conflicting lists of a species a is bounded by the invariant 

|F(a)| + |C(a)| < d-2 . (2) 

The adversary uses the sum |F(a) | -I- |C(a) | to decide when to move a species a 
one level down in D. Whenever the invariant o becomes violated because 
|F(a)| -I- |C(a)| = d — 1, for a species a stored at a node v, the adversary moves a 
to a child a ^ F(a) of v. Since |F(a)| < d — 1, such a Ci ^ F(a) is guaranteed 
to exist. When moving a from u to c,, the adversary updates the forbidden and 
conflicting lists as follows: For all b G C(a), a is deleted from C(6) and Ci is 
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inserted into F(6). If Ci was already in F(6), the sum |F(6)| + |C(&)| decreases by 
one, if Ci was not in F(6) the sum remains unchanged. Finally, F(a) and C(a) 
are assigned the empty set. 

For two species a and 6, we define their least common ancestor, LCA(a, &), 
to be the least common ancestor of the two nodes storing a and b in D. We 
denote LCA(a, b) as fixed if it cannot be changed by future moves of a and b 
by the adversary. If LCA(a, b) is fixed then the least common ancestor of the 
two species a and 6 in T is the node LCA(a, b). If a is stored at node Va and b 
is stored at node vt, it follows that LCA(a, b) is fixed if and only if one of the 
following four conditions is satisfied. 

1. Va = LCA(a, 6) = Vb and a S C(5) (and b G C(a)). 

2. Va yf LCA(a, 6) = Vb and G F(6), where Ci is the child of Vb such that the 
subtree rooted at Ci contains Va- 

3. Va = LCA(a, b) yf Vb and Ci G F(a), where Ci is the child of Va such that the 
subtree rooted at Ci contains Vb- 

4. Va yf LCA(a, 6) yf Vb- 

In Case ^ species a and b are stored at the same node and cannot be moved 
to the same child because a G C(6), i.e. LCA(a,6) is fixed as the node which 
currently stores a and b. Cases |3 and |3 are symmetric. In Case |2 species a is 
stored at a descendant of a child Ci of the node storing b, and b cannot be moved 
to Ci because Ci G F(6), i.e. LCA(a, b) is fixed as the node which currently stores b. 
Finally, in Case0, species a and b are stored at nodes in disjoint subtrees, i.e. 
LCA(a, b) is already fixed. 

The operation Fix(a, b) ensures that LCA(a, b) is fixed as follows: 

1. If Va = LCA(a, 6) = Vb and a ^ C{b) then insert a into C{b) and insert b 
into C(a). 

2. If Va yf LCA(a, &) = Vb and Ci ^ F(&), where Ci is the child of Vb such that 
the subtree rooted at Ci contains Va, then insert Ci into F(6). 

3. If Va = LCA(a, b) yf Vb and a ^ F(a), where Ci is the child of Va such that 
the subtree rooted at Ci contains Vb, then insert Ci into F(a). 

Otherwise Fix(a, 6) does nothing. If performing Fix(a, 6) increases |F(a)| such 
that |F(o)| + |C(a)| = d—1, then a is moved one level down as described above. 
Similarly, if |F(6) | + |C(6) | = d—1 then b is moved one level down. After perform- 
ing Fix(a, b) we thus have that |F(a)| -I- |C(a)| < d—2 and |F(&)| -|- |C(6)| < d—2, 
which ensures that the invariant (0 is not violated. 

When the construction algorithm performs an experiment on three species 
a, b and c, the adversary decides the outcome of the experiment based on the 
current distribution of the species on D and the content of the conflicting and 
forbidden lists. To ensure the consistency of future answers, the adversary first 
fix the least common ancestors of a, b and c by applying the operation Fix three 
times: Fix(a,6), Fix(a,c) and Fix(&, c). After having fixed LCA(a, 6), LCA(a,c), 
and LCA(6, c), the adversary decides the outcome of the experiment by examin- 
ing LCA(a, b), LCA(a, c), and LCA(6, c) in D as described below. The four cases 
correspond to the four possible outcomes of an experiment cf. Fig. Q 
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1. If LCA(a, b) = LCA(6, c) = LCA(a, c) then return (a, 6, c). 

2. If LCA(a, b) ^ LCA(6, c) = LCA(a, c) then return ((a, &), c). 

3. If LCA(a, c) ^ LCA(a, 6) = LCA(6, c) then return ((a, c), 6). 

4. If LCA(6, c) ^ LCA(a, 6) = LCA(a, c) then return ((6, c), a). 

5 Lower Bound Analysis 

We will argue that the above adversary strategy forces any construction algo- 
rithm to perform at least f2{ndlog^ n) experiments before being able to conclude 
unambiguously the evolutionary relationships between the n species. 

Theorem 3. The construction of an evolutionary tree for n species requires 
I2(nc?log^n) experiments, where d is the degree of the constructed tree. 

Proof. We first observe that an application of Fix(o, b) at most increases the 
size of the two conflicting lists, C(a) and C(6), by one, or the size of one of the 
forbidden list, F(a) or F(6), by one. If performing Fix(a, &) increases the sum 
|F(a)| -I- |C(a)| to d — I, then species a is moved one level down in D and F(a) 
and C(a) are emptied, which causes the overall sum of the sizes of forbidden and 
conflicting lists to decrease by d—1. This implies that a total of k Fix operations, 
starting with the initial configuration where all conflicting and forbidden lists are 
empty, can cause at most 2k/{d — 1) moves. Since an experiment involves three 
Fix operations, we can bound the total number of moves during m experiments 
by 6m/ {d — 1). 

Now consider the configuration, i.e. the distribution of species and the content 
of conflicting and forbidden lists, when the construction algorithm computing 
the evolutionary tree terminates. Some species may have nonempty forbidden 
lists or conflicting lists. By forcing one additional move on each of these species 
as described in Sect. 2] we can guarantee that all forbidden and conflicting lists 
are empty. At most n additional moves must be performed. 

Let T' be the tree formed by the paths in D from the root to the nodes 
storing at least one species. We first argue that all internal nodes of T' have at 
least two children. If a species has been moved to a child of a node, then the 
forbidden list or conflicting list of the species was nonempty. If the forbidden list 
was nonempty, then each of the forbidden subtrees already contained at least one 
species, and if the conflicting list was nonempty there was at least one species on 
the same node that was required to be moved to another subtree, at the latest 
by the n additional moves. It follows that if a species has been moved to a child 
of a node then at least one species has been moved to another child of the node, 
implying that T' has no node with only one child. 

We next argue that all n species are stored at the leaves of T' and that each 
leaf of T' stores either one or two species. If there is a non-leaf node in T' that 
still contains a species, then this species can be moved to at least two children 
already storing at least one species in the respective subtrees, implying that the 
adversary can force at least two distinct evolutionary trees which are consistent 
with the answers returned. This is a contradiction. It follows that all species 
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are stored at leaves of T' . If a leaf of T' stores three or more species, then an 
experiment on three of these species can generate different evolutionary trees, 
which again is a contradiction. We conclude that each leaf of T' stores exactly 
one or two species, and all internal nodes of T' store no species. It follows that T' 
has at least n/2 leaves. 

For a tree with k leaves and degree d, the sum of the depths of the leaves is at 
least klogj^ k. Since each leaf of T' stores at most two species, the n species can 
be partitioned into two disjoint sets of size |"n/2] and [n/2j such that in each 
set all species are on distinct leaves of T' . The sum of the depths of all species is 
thus at least |"n/2] log^;|’n/2] + [n/2j log^[n/2j > nlogj^{n/2). Since the depth 
of a species in D is equal to the number of times the species has been moved one 
level down in D, and since m experiments generate at most Qm/(d — 1) moves 
and we perform at most n additional moves, we get the inequality 

nlog^(n/2) < 6m/(d— l) + n , 

from which the lower bound m > {d — l)n{logj^{n/2) — l)/6 follows. □ 
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Abstract. We consider the sequence comparison problem, also known 
as “hidden pattern” problem, where one searches for a given subsequence 
in a text (rather than a string understood as a sequence of consecutive 
symbols). A characteristic parameter is the number of occurrences of 
a given pattern w of length m as a subsequence in a random text of 
length n generated by a memoryless source. Spacings between letters of 
the pattern may either be constrained or not in order to define valid 
occurrences. We determine the mean and the variance of the number 
of occurrences, and establish a Gaussian limit law. These results are 
obtained via combinatorics on words, formal language techniques, and 
methods of analytic combinatorics based on generating functions and 
convergence of moments. The motivation to study this problem comes 
from an attempt at finding a reliable threshold for intrusion detections, 
from textual data processing applications, and from molecular biology. 



1 Introduction 

String matching and sequence comparison are two basic problems of pattern 
matching known informally as “stringology” . Hereafter, by a string we mean 
a sequence of consecutive symbols. In string matching, given a pattern w = 
W\W2 ■ ■ ■ Wm (of length m) one searches for some/all occurrences of w (as a 
block of consecutive symbols) in a text Tn of length n. The algorithms by Knuth- 
Morris-Pratt and Boyer-Moore [Z] provide efficient ways of finding such occur- 
rences. Accordingly, the number of string occurrences in a random text has been 
intensively studied over the last two decades, with significant progress in this 
area being reported mnnmnEi. For instance Guibas and Odlyzko 
mu have revealed the fundamental role played by autocorrelation vectors and 
their associated polynomials. Regnier and Szpankowski ilEUZI established that 
the number of occurrences of a string is asymptotically normal under a diver- 
sity of models that include Markov chains. Nicodeme, Salvy, and Flajolet PS! 
showed generally that the number of places in a random text at which a ‘motif’ 
(i.e., a general regular expression pattern) terminates is asymptotically normally 
distributed. 

In sequence comparisons, we search for a given pattern yV = wiW2 ■ ■ ■ Wm in 
the text T„ = t\t2 ... tn as a subsequence^ that is, we look for indices 1 < < 
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i 2 < ■ ■ ■ < im < n such that = wi, ti^ = u> 2 , • • •, = Wm- We also say that 

the word w is “hidden” in the text; thus we call this the hidden pattern problem. 
For example, date occurs as a subsequence in the text hidden pattern, in 
fact four times, but not even once as a string. We can impose an additional 
set of constraints T> on the indices ii,i 2 , ■ ■ ■ ,im to record a valid subsequence 
occurrence: for a given family of integers dj {dj > 1, possibly dj = oo), one 
should have {ij+i — ij) < dj. In other words, the allowed lengths of the “gaps” 
(ij+i—ij — l) should be < dj. With # representing a ‘don’t-care-symbol’ (similar 
to the Unix ‘^’-convention) and the subscript denoting a strict upper bound on 
the length of the associated gap, a typical pattern may look like 

ab#2r#ac#a#d#4a#br#a; 

there, # abbreviates #oo and #i is omitted; the meaning is that ‘ab’ should 
occur first contiguously, followed by ‘r’ with a gap of < 2 symbols, followed 
anywhere later in the text by ‘ac’, etc. The case when all the dj’s are infinite 
is called the unconstrained problem; when all the dj’s are finite, we speak of the 
constrained problem. The case where all dj reduce to 1 gives back classical string 
matching as a limit case. 

Motivations. Our original motivation to study this problem came from in- 
trusion detection in the area of computer security. The problem is important 
due to the rise of attacks on computer systems. There are several approaches 
to intrusion detections, but, recently the pattern matching approach has found 
many advocates, most notably in mm- The main idea of this approach is 
to search in an audit file (the text) for certain patterns (known also as signa- 
tures) representing suspicious activities that might be indicative of an intrusion 
by an outsider, or misuse of the system by an insider. The key to this approach 
is to recognize that these patterns are subsequences because an intrusion sig- 
nature specification requires the possibility of a variable number of intervening 
events between successive events of the signature. In practice one often needs 
to put some additional restrictions on the distance between the symbols in the 
searched subsequence, which leads to the constrained version of subsequence 
pattern matching. The fundamental question is then: How many occurrences of 
a signature (subsequence) constitute a real attack? In other words, how to set a 
threshold so that we can detect only real intrusions and avoid false alarms? It 
is clear that random (unpredictable) events occur and setting the threshold too 
low will lead to an unrealistic number of false alarms. On the other hand, setting 
the threshold too high may result in missing some attacks, which is even more 
dangerous. This is a fundamental problem that motivated our studies of hidden 
pattern statistics. By knowing the most likely number of occurrences and the 
probability of deviating from it, we can set a threshold such that with a small 
probability we miss real attacks. 

Molecular biology provides another important source of applications HBE21 
Bl| . As a rule, there, one searches for subsequences, not strings. Examples are 
in abundance: split genes where exons are interrupted by introns, starting and 
stopping signal in genes, tandem repeats in DNA, etc. In general, for gene search- 
ing, the constrained hidden pattern matching (perhaps with an exotic constraint 
set) is the right approach for finding meaningful information. The hidden pattern 



154 P. Flajolet et al. 



problem can also be viewed as a close relative of the longest common subsequence 
(LCS) problem, itself of immediate relevance to computational biology and still 
surrounded by mystery pUj . 

We, computer scientists and mathematicians, are certainly not the first who 
invented hidden words and hidden meaning (Q. Rabbi Akiva in the first century 
A.D. wrote a collection of documents called Maaseh Merkava on secret mys- 
ticism and meditations. In the eleventh century Spanish Solomon Ibn Gabirol 
called these secret teachings Kabbalah. Kabbalists organized themselves as a 
secret society dedicated to study of the ancient wisdom of Torah, looking for 
mysterious connections and hidden truth, meaning, and words in Kaballah and 
elsewhere (without computers!). Recent versions of this activity are knowledge 
diseovery and data mining, bibliographic search, lexicographic research, textual 
data processing, or even web site indexing. Public domain utilities like agrep, 
grappe, webglimpse (developed by Manber and Wu [29 1 Kucherov [E], and 
others) depend crucially on approximate pattern matching algorithms for subse- 
quence detection. Many interesting algorithms, based on regular expressions and 
automata, dynamic programming, directed acyclic word graphs, digital tries or 
suffix trees have been developed; see ISEIM for a flavour of the diversity of 
approaches. 

In all of the contexts mentioned above, it is of obvious interest to discern what 
constitutes a meaningful observation of pattern occurrences from what is merely 
a statistically unavoidable phenomenon (noise!). This is precisely the problem 
addressed here. We establish hidden pattern statistics — i.e., precise probabilistic 
information on number of occurrences of a given pattern w as a, subsequence in 
a random text T„ generated by a memoryless source, this in the most general 
case (covering the constrained and unconstrained versions as well as mixed sit- 
uations). Surprisingly enough and to the best of our knowledge, there are no 
results in the literature that address the question at this level of generality. An 
immediate consequence of our results is the possibility to set thresholds at which 
appearance of a (subsequence) pattern starts being meaningful. 

Results. Let be the number of occurrences of a given pattern W as a 
subsequence in a random text of length n generated by a memoryless source 
(i.e., symbols are drawn independently). We investigate the general case where 
we allow some of the gaps to be restricted, and others to be unbounded. Then the 
most important parameter is the quantity b defined as the number of unbounded 
gaps (the number of indices j for which dj = oo) plus 1; the product D of all 
the finite constraints dj plays also a role. We obtain the mean, the variance, all 
moments, and finally a central limit law. Precisely, we prove in Theorem 1 that 
the number of occurrences has mean and variance given by 

„b 

E[f2„] ^ — Z?7 t(W), Var[I2„] - 

where 7r(W) is the probability of W, and ct^(W) is a computable constant that 
depends explicitly (though intricately) on the structure of the pattern W and the 
constraints. Then we prove the central limit law by moment methods, that is, we 
show that all centered moments ( 17 „ — E[l 7 „])/n ^“2 converge to the appropriate 
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moments of the Gaussian distribution (Theorem |2) . We stress that, except in 
the constrained case, the difficulty of the analysis lies in a nonlinear growth of 
the mean and the variance so that many standard approaches to establishing 
the central limit law tend to fail. 

For the unconstrained problem, one has b = m, and both the mean and 
the variance admit pleasantly simple closed forms. For the constrained case, 
one has 6=1, while the mean and the variance become of linear growth. To 
visualize the dependency of cr^(>V) of W, we observe that, when all the dj 
equal 1, the problem reduces to traditional string matching that was extensively 
studied in the past as witnessed by the (incomplete) list of references: |dl9lil)l 
11^11(11171:^41 . It is well known that for string matching the variance coefficient 
is a function of the so-called autocorrelation of the string. In the general case 
of hidden pattern matching, the autocorrelation must be replaced by a more 
complex quantity that depends on the way pairs of constrained occurrences may 
intersect (cf. Theorem P). 

Methodology. The way we approach the probabilistic analysis is through a 
formal description of situations of interest by means of regular languages. Ba- 
sically such a description of contexts of one, two, or several occurrences gives 
access to expectation, variance, and higher moments, respectively. A systematic 
translation into generating functions is available by methods of analytic com- 
binatorics deriving from the original Chomsky-Schiitzenberger theorem. Then, 
the structure of the implied generating functions at the pole z = 1 provides the 
necessary asymptotic information. In fact, there is an important phenomenon 
of asymptotic simplification where the essentials of combinatorial-probabilistic 
features are reflected by the singular forms of generating functions. For instance, 
variance coefficients come out naturally from this approach together with, for 
each case, a suitable notion of correlation; higher moments are seen to arise from 
a fundamental singular symmetry of the problem, a fact that eventually carries 
with it the possibility of estimating moments. From there Gaussian laws eventu- 
ally result by basic moment convergence theorems. Perhaps the originality of the 
present approach lies in such a joint use of combinatorial-enumerative techniques 
and of analytic-probabilistic methods. 

2 Framework 

We fix an alphabet A := {oi, 02 , ... , a^}. The text is T„ = t\t 2 • • • A particu- 
lar matching problem is specified by a pair (W, T>)\ the pattern W = Wi ■ ■ ■ Wm 
is a word of length m; the constraint T> = {di, . . . , dm-i) is an element of 
(N+ U {oo})"*-!. 

Positions and occurrences. An m-tuple I = ■ ■ Am) (1 < *i < 

A <■■■ < im) satisfies the constraint T> if ij+i — ij < dj, in which case it is 
called a position. Let VniP) be the set of all positions subject to the separation 
constraint T>, satisfying furthermore im A n-. An occurrence of pattern W in the 
text T„ of length n subject to the constraint 2? is a position I = (A, * 2 , ■ ■ • , *m) 
of Vn{TA) for which = wi, • • • , = Wm- For a text T„ of length n. 
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the number of occurrences (of w) subject to the constraint T> is then a 

sum of characteristic variables 

iln{T>) = ^ Xi, with Xj := |rt; occurs at position I in T„], (1) 

leVniv) 

where |i?] = 1 if the property B holds, and |i?] = 0 otherwise (Iverson’s nota- 
tion). 

Blocks and aggregates. In the general case, the subset T of indices j 
for which dj is finite {dj < oo) has cardinality m — b with 1 < b < m. The 
two extreme values of b, namely, b = m and & = 1, thus describe the (fully) 
unconstrained and the (fully) constrained problem respectively. The subset U of 
indices j for which dj is unbounded {dj = oo) has cardinality b—1. It separates 
the pattern W into b independent subpatterns that are called the blocks and 
are denoted by Wi,W 2 , . . -Wb- All the possible dj “inside” Wr are finite and 
form the subconstraint T>r. In the example described in the introduction, one 
has 6 = 6 and the six blocks are 

Wi =a#ib# 2 r, n >2 = a#ic, Wa= a, W4= d# 4 a, W 5 =b#ir, a. 

In the same way, an occurrence I = (ii,i 2 , ■ ■ ■ j*m) of W subject to constraint 
V gives rise to 6 subpositions the rth term being an occur- 

rence of yVr subject to constraint Vr. The rth block i?!”! is the closed segment 
whose end points are the extremal elements of Xl”] , and the aggregate of position 
/, denoted by a{I), is the collection of these 6 blocks. In the example of the 
introduction, the position 

7 = (6, 7, 9, 18, 19, 22, 30, 33, 50, 51, 60) 
satisfies the constraint T> and gives rise to six subpositions, 

7>d = (6, 7, 9), /Pi = (18, 19), /PI = 22, /^ = (30, 33), /PI = (50, 51), /'^ = 60; 
accordingly, the resulting aggregate a{I) is formed with six blocks, 

= [6,9], fiPi = [18,19], fiPi = [22], fiW = [30,33], S'®’ = [50,51], S'®’ = [60]. 

Probabilistic model. We consider a memoryless source that emits symbols 
of the text independently and denote by Pa (0 < Pa < 1) the probability of 
the symbol a € A being emitted. For a given length n, a random text, denoted 
by Tn is drawn according to the product probability on A^. For instance, the pat- 
tern probability 7r(W) is defined by 7r(yV) = Y\A=\Pwi, a quantity that surfaces 
throughout the analysis. Under this randomness model, the quantity finiP) be- 
comes a random variable that is itself a sum of correlated random variables Xj 
(defined in (P)) for all allowable / G Vn{P)- 

Generating functions. We shall consider throughout this paper structures 
superimposed on words. For a class V of structures and given a weight function c 
(induced by the probabilities of individual letters), we introduce the generating 
function 



V{z) = J2VnZ^ :=^c(r;)zH, 



n 
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where the size |f| is the number of letters involved in the structure. TheiQ, 
Vn = [z'^]V{z) is the total weight of all structures of size n. The collection of 
occurrences is described by means of regular expressions extended with disjoint 
unions, and Cartesian products. It is then known that disjoint unions and Carte- 
sian products correspond respectively to sums and products of generating func- 
tions; see nwn for a general framework. Such correspondences make it possi- 
ble to translate symbolically combinatorial descriptions into generating function 
equations and a great use is made of this in what follows. All the resulting gen- 
erating functions turn out to be rational, of the form V{z) = (1 — 
for some integer k > 0 and polynomial P, so that 

3 Mean and Variance Estimates of the Nnmber of 
Occnrrences 



Mean value analysis. The first moment analysis is easily obtained by describ- 
ing the collection of all occurrences in terms of formal languages. Let O be the 
collection of all occurrences of W as a hidden word. Each occurrence can be 
viewed as a “context” with an initial string, then the first letter of the pattern, 
then a separating string, then the second letter, etc. The collection O is then 
described by 

O = A*x{wi}xA‘^'^^ x{w2}xA^‘^^ X. . x{wm}xA*. (3) 

There, for d < oo, A^^^ denotes the collection of all words of length strictly less 
d, i.e., := whereas, for d = oo, A^°° denotes the collection of all 

finite words, i.e., A^°° := A* = Uz<oo-^*- associated generating functions 
are 

Ad{z) = 1 + z + z"^ -\ 1- z‘^~^ = \ Aao{z) = 1 + z + z^ A = ^ . 

1 — z 1 — z 

We now weight each occurrence by the quantity 7r(w) = E\Xi\, so that the 
generating function 0{z) of O coincides with the generating function of the 
expectations E[f?„], 

0(z) = ^E[f2„] z" = X X ||n > (4) 

and, with 7r(>V) the probability of the pattern W, one finds from (EJ and 0: 

E[f?„] = [z"]o(z) = ^ (|n (^ + ^ (^) ) ■ 

The notation [z'^]f(z) represents the coefficient of in the series f{z). 



1 
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Variance analysis. For variance and higher moment analysis, it is essential 
to work with centred random variables defined as 

Yi:=Xi--E[Xj] = Xi-Tr{W), -„(P) := - E[12„(P)] = ^ Yi. 

iev„CD) 

The second moment of the centred variable Sn{'D) equals the variance of 
and with the centred variables defined above one has 

^ ^ ^^YjYj]. 

i,Jer„CD) 

There are two kinds of pairs (/, J) according as they intersect or not. When 
/ and J do not intersect, the corresponding random variables 1/ and Yj are 
independent, and the corresponding covariance E[YjYj] reduces to 0. It is thus 
sufficient to consider intersecting subsets I and J. Suppose that there exist two 
occurrences of pattern >V at positions I and J which intersect at i distinct places, 
the /c-th intersection point being the r^-th in the natural ordering of / and the 
Sfc-th in the natural ordering of J. (This is only possible if, for all fc, 1 < fc < £, 
one has Wr^ = Ws^^.) We then denote by W/nJ the subpattern of >V that occurs 
at position I (1 J, and by 7r(W/nj) the probability of this subpattern. Since 
the expectation E[X/Xj] equals 7r(>V)^/7r(W/nj), the expectation E[y/lj] = 
E[X/Xj] — involves a correlation number e(/, J) 

nYiYj] = 7t 2(W) e(J, J), with e(J, J) = ] - 1. (5) 

7i'( W/njj 

In this case, we take the pair of occurrences relative to {I, J) as weighted by 
E[YiYj], and consider the collection O 2 of pairs of intersecting occurrences. The 
associated generating function 02 {z) coincides with the generating function of 
the expectations E[YjYj], that is, 

02{z) = ^ ^ E[YiYj] = 

n>l n>l 

We now need to estimate 02{z) as 2 — >■ 1. First, define the aggregate a{I, J) 
to be the system of blocks obtained by merging together all intersecting blocks 
of the two aggregates a{I) and a{J). The number of blocks of a{I,J) 

plays a fundamental role here, since it measures the degree of freedom of pairs. 
Since / and J intersect, there exists at least one block of a{I) that intersects 
a block of a{J), so that is at most equal to 2b — 1. Next, we group 

the sets I, J according to the value of (3{I, J) and write for the collection 
of intersecting pairs (I, J) of occurrences for which /?(/, J) equals 2b — p. Since 
there is a fundamental translation invariance, we introduce a notion of full pairs'. 
a pair (/, J) of Vq{'D) x Vq{V) is full if the aggregate a{I, J) completely covers 
the interval [1, g]. (Clearly, the possible values of q are finite.) Then the collection 
is isomorphic to x where is the subset of full pairs such 

that /?(/, J) equals 2b — p. The generating function of is accordingly 

/ 1 \ 26 -p+l 
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Here, B^\z) is the generating function of the collection and from our earlier 
discussion, it is a, polynomial oi degree at most 2d(m— 1) + 1, with d = di. 

Now, an easy dominant pole analysis entails that This 

proves that the dominant contribution to the variance is given by which 

is of order Then, the variance E[S'^] involves the constant 

that is the total weight of the collection the polynomial B^\z) is itself the 
generating function of the collection B ^2 \ conceptually an extension of Guibas 
and Odlyzko’s autocorrelation polynomial. 

Since the standard deviation is of an order, that is smaller than 

the mean, O(n^), concentration of distribution holds, via a well-known argument 
based on Chebyshev’s inequalities. In summary: 



Theorem 1. Consider a general constraint T> and the number of occurrences 
fin = finiB). The mean and variance of fi„ satisfy 



E[«J + 

Var[f2„] = (^1 + 0(1)^ , 



where T is the set of j such that dj < oo, and the “variance coefficient” ct^(W) 
involves the autocorrelation k(W) 






7r^(W) 

(26-1)! 



K^{W) 



with K^(W) 



E 



(7T(W/nj) 




( 6 ) 



The set B^ 2 ^ is the collection of all pairs of occurrences (/, J) that satisfy three 
conditions: (i) they are full; (ii) they are intersecting; (Hi) there is a single pair 
(r, s) with 1 < r, s < b for which the rth block of a{I) and the sth block 
of a{J) intersect. 



Computation of the variance. The computation of the autocorrelation 
k{W) reduces to 6^ computations of correlations k(Wi.,Ws), relative to pairs 
{Wr,yVs) of blocks. Note that each correlation of the form K(Wr,Ws) involves 
a totally constrained problem and can be evaluated by dynamic programming. 
Precisely, one has 



= £)2 




k(W„W«), 



( 7 ) 



where K(>Vr,kVs) is the sum of the e{I,J) taken over all full intersecting pairs 
(/, J) formed with an occurrence I of Wr subject to constraint T>r and an oc- 
currence J of Ws subject to constraint T>s- Let us explain the formula (0) in 
words: for a pair (/, J) of the set 3^2^ there is a single pair (r, s) of indices with 
1 < r, s < 6 for which the rth block of a{I) and the sth block of a{J) 
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intersect. Then, there exist r + s — 2 blocks before the block and 

2b — r — s blocks after it. We then have three different degrees of freedom: {i) 
the relative order of blocks < r) and blocks < s), and similarly the 

relative order of blocks > r) and blocks > s); (ii) the lengths of the 

blocks (there are Dj possible lengths for the jth block); (in) finally the relative 
positions of the blocks and 

In the unconstrained problem, the parameter b equals m, and each block W,. 
is reduced to the symbol Wr- Then the “ correlation coefficient” k^(W) simplifies 
to 



^(W):= ^ 



Kr,s<r 



r + s — 2 
r — 1 



2m — r — s 
m — r 



T(r,s)(— -1), (8) 



where the “autocorrelation matrix” F of pattern W is defined by r(r,s) := 
{Wr = Wsl- 



4 Central Limit Laws 



Our goal is to prove that appropriately normalized tends to the standard 
normal distribution. We consider the following normalized random variable 

^ •^n _ 

■“ „&-l/2 - ^b-1/2 ’ 

where b is the number of blocks of the constraint V. We shall show that Sn 
behaves asymptotically as a normal variable with mean 0 and standard devia- 
tion cr. By the classical moment convergence theorem (Theorem 30.2 of 0) this 
is established once all moments of are known to converge to the appropriate 
moments of the standard normal distribution. We remind the reader that if G 
is a standard normal variable (i.e., a Gaussian distributed variable with mean 0 
and standard deviation 1), then for any integral s > 0 

E[G^®] = l-3---(2s-l), E[G2®+1] = 0. (9) 

We shall accordingly distinguish two cases based on the parity of r, r = 2s and 
r = 2s -|- 1, and prove that 

E[S2s+ 1] ^ ^^^(2s+l)(h-l/2))^ gj^2sj ^ , 3 , , , (2s _ 1)) ^^sb-s ^ 



which implies Gaussian convergence of ^n- 

Theorem 2. The random variable asymptotically follows a Central Limit 
Law: 



lim Pr 

n—^oo 



-E[I2„] 

y^VarpVf 




( 11 ) 



Proof. The proof below is combinatorial; it basically reduces to grouping and 
enumerating adequately the various combinations of indices in the sum that 
expresses E[S'”]. Once more, Vn{T>) is formed of all the positions of [1, n] subject 
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to the constraint V and V{T>) = Then totally distributing the terms 

in S^{T>) yields 

EK]= ^ (12) 

An r-tuple of sets (Ii, . . . , Ir) in 'P^{'D) is said to be friendly if each intersects 
at least one other Ii, with i ^ k and we let be the set of all friendly 

collections in P'"{T>). For Q^'^\ and their derivatives below, we add the 
subscript n each time the situation is particularized to texts of length n. If 
(/i, . . . , Ir) does not lie in then E[Yfj • • • Yj^] = 0, since at least one 

of the Yj’s is independent of the other factors in the product and the Yj’s have 
been centred, E[Yf] = 0. One can thus restrict attention to friendly families and 
get the basic formula 



E[S;]= ^ E[Yj,---Ya (13) 

where the expression involves fewer terms than in (C2|)- From there, we proceed in 
two stages. First, restrict attention to friendly families that give rise to the dom- 
inant contribution and introduce a suitable subfamily C in so doing, 
moments of odd order appear to be negligible. Next, for even order r, the family 

(r') 

Q* involves a symmetry and it suffices to consider another smaller subfamily 

(t) fr) 

Q** C Q* that corresponds to a “standard” form of occurrence intersection; 
this last reduction precisely gives rise to the even Gaussian moments. 

Odd moments. Given (Ji,...,/^) S one defines the aggregate 

a{Ii,l 2 , - ■ ■ ,Ir) as the aggregation (in the sense of the variance calculation 
above) of 0 !(/i) U • • • U a{Ir). Next, the number of blocks of (/i, . . . ,Ir) is the 
number of blocks of the aggregate o;(/i, . . . , Ir)', if P is the total number of inter- 
secting blocks of the aggregate a(/i , . . . ,Ir), the aggregate a{I\,l 2 , ■ ■ ■ Ir) has 
rb — p blocks. Like previously, we say that the family (/i, . . . , A) of Q^q'^ is full 
if the aggregate a{I\,l 2 , ■ ■ - Ir) completely covers the interval [1, g]. In this case, 
the length of the aggregate is at most rd{m — 1) -I- 1, and the generating func- 
tion of full families is a polynomial Pr{z) of degree at most rd{m — 1) -I- 1 with 
d = maxjgjrdj. Then, the generating function of families of whose block 
number equals k is of the form 



1 - 2 



fc-l-1 



Pr{z), 



SO that the number of families of Qn^ whose block number equals k is O(n^). This 
observation proves that the dominant contribution to m arises from friendly 
families with a maximal block number. It is clear that the minimum number of 
intersecting blocks of any element of equals equals |"r/2], since it coincides 
exactly with the minimum number of edges of a graph with r vertices which 
contains no isolated vertex. Then the maximum block number of a friendly 
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family equals rb — |"t'/ 2]. In view of this fact and the remarks above regarding 
cardinalities, we immediately have 

E = o = o („(2-+D(b-i/2)) 

which establishes the limit form of odd moments in (EOl). 

Even moments. We are thus left with estimating the even moments. The 
dominant term is relative to friendly families of Q( 2 s) intersecting block 

number equal to s, whose set we denote by Q* . In such a family, each subset Ik 
intersects one and only one other subset Ii. Furthermore, if the blocks of a{Ih) 
are denoted by , 1 < u < 6, there exists only one block of a{Ik) and only 
one block Bif’’'^ that contains the points of Ik H If. This defines an involution 
r such that r(fc) = I and r(£) = k for all pairs of indices (£, k) for which 
Ik and 1^ intersect. Furthermore, given the symmetry relation E[F7j • • • = 

E[F/^(ij ■ • • it suffices to restrict attention to friendly families of for 

which the involution r is the standard one with cycles (1, 2), (3, 4), etc; for such 
“standard” families whose set is denoted by Q** , the pairs that intersect are 
thus (Ii, / 2 ), . . . , {l 2 s-i, hs)- Since the set JC 2 s of involutions of 2s elements has 
cardinality K 2 s = 1 • 3 • 5 • • • (2s — 1), the equality 



E = K2s E ■ YiJ, (14) 

q(2s) q(2s) 



entails that we can work now solely with standard families. 

The class of occurrences relative to standard families is A* x x 

1 B 2 ] X A* , and involves the collection friendly 2s-tuples of occur- 

rences with a number of blocks equal to s. Since S 2 j is exactly a shuffle of s copies 
of (as introduced in the study of the variance), the associated generating 
function is 



1 



1 - z 



2s&— s+1 

(2s6 — s)! 



{2b-l)\) ’ 



where b)^^ (z) is the already introduced autocorrelation polynomial. Upon taking 
coefficients, we obtain the estimate 






In view of the formulae (Cl, dsj, da, and da above, this yields the estimate 
of even moments and leads to the second relation of (f 1 1 )|l . (Note that the even 
Gaussian moments eventually come out of the number of involutions, which 
corresponds to a fundamental symmetry present in the problem.) This completes 
the proof of Theorem |3 
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5 Conclusion 

As a test case, we took the full text of Hamlet where all nonalphabetic characters 
are suppressed. This gives us a (rather unpoetical looking) text that has one long 
line with 30,316 words and n = 120,057 alphabetical characters: “who s there 
nay answer me stand and unfold yourself long live the king bernardo he you 
come most carefully upon your hour [. . . . The pattern is “ The law is Gaussiarf 

[w = thelawisgaussian] and its mirror image w, corresponding to m = 16. Based 
on the empirical distribution of letter frequencies in the text, we anticipate the 
pattern to appear 1.330 10"^® times as a subsequence, while the observed counts 
are 1.365 10"^® and 1.388 10^®, a deviation of less than 4% from what is expected. 
Similarly, if we bound the separation distance between any two letters by d, 
analysis predicts that the pattern might start occurring near d = 10, while its 
presence is unlikely for smaller values, d < 10. In fact, w starts occurring at 
d = 14 while iv starts at d = 13 — a deviation of some 30-40% from what the 
model predicts. Here is a table of observed versus predicted values when d varies: 





w = thelawisga 


ussian 


w = naissuagsiwaleht 


d 


Expected (E) 


Occurred (f?) 


Q/E 


Occurred (12) 


n/E 


13 


9.195E-b01 


0 


0.00 


18 


0.19 


14 


2.794E-b02 


693 


2.47 


371 


1.32 


20 


5.886E-b04 


124,499 


2.11 


41,066 


0.69 


50 


5.482E-blO 


76,146,232,395 


1.38 


48,386,404,680 


0.88 


OO 


1.330E-b48 


1.36554E-f48 


1.03 


1.38807E-t48 


1.04 



This (together with many other experiments) shows a fair fit between the the- 
oretical model and the observed data even though the text chosen is far from 
being “random” . 

Extensions. For the constrained case where all the distances are finite, based 
on finite state models and the de Bruijn graph, it is possible to obtain local 
limit laws (i.e., a direct estimation of probability densities), a characterization 
of the speed of convergence to the asymptotic limit (it is as well as large 

deviation estimates (that are exponentially small); see the full paper. For the 
unconstrained case, the corresponding problems appear to be related to products 
of random matrices and to the difficult case of random walks on nilpotent Lie 
groups; see Guivarc’h’s paper El for context and references. Finally, preliminary 
investigations indicate that the methods developed here apply to Markovian 
sources and more generally to all dynamical sources in the sense of Vallee jEE2|. 
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Low-Discrepancy Roundings of a Real Sequence 
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Abstract. In this paper, we discuss the problem of computing all the 
integral sequences obtained by rounding an input real valued sequence 
such that the discrepancy between the input sequence and each output 
integral sequence is less than one. We show that the number of such 
roundings is n + 1 if we consider the discrepancy with respect to the 
set of all subintervals, and give an efficient algorithm to report all of 
them. Then, we give an optimal method to construct a compact graph 
to represent the set of global roundings satisfying a weaker discrepancy 
condition. 



1 Introduction 

For a given real number a, its rounding is either \ a\ or [a]. Given a sequence 
o- = of real numbers, its rounding is an integral sequence b = (&i)i<*<n 

such that each entry bi is a rounding of a^. Without loss of generality, we can 
assume that each entry of a is in the closed interval [0, 1]. Thus, the rounding 
of a becomes a binary array. 

There are 2" possible roundings of a given a, and we would like to compute 
good-quality roundings with respect to a given criterion. The problem is not only 
combinatorially interesting but also related to coding theory, data compression, 
computer vision, operations research, and Monte Carlo simulation. 

In order to give a criterion to determine quality of roundings, we define a 
distance in the space A of all [0, l]-valued sequences of n real numbers. For an 
element a G A, let a{I) = Yhi&i be the sum of entries of a whose indices 
are located in an interval I C [l,n]. We fix a family of T of intervals. The loo 
distance between two elements a and a! in A with respect to T is defined by 

Dist^{a, a') = max \a{I) — a\I)\. 

Dist^ {a, b) is the rounding error of a rounding b of a given [0, l]-valued se- 
quence a measured by using the distance. The supremum of the optimal rounding 
error sup^g^min^gg Dist^{a, b) is called the inhomogeneous discrepancy of A 
with respect to the family T 0. Here, B is the set of all binary valued sequences 
of length n. The most popular case is where T is the set of all integral subin- 
tervals of [1 , n] , and the discrepancy of with respect to is sometimes called 

the 1-dimentional discrepancy in the literature. 
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Abusing the notation, we often call the error Dist^{a,b) the discrepancy 
between a and b with respect to J-. 

We say that a rounding b of a is an iF -global rounding if Dist^ {a,b) < 1 
holds; in other words, b is a global rounding of a if and only if b[J] is a rounding 
of a[I] for every I £ J- .It is known that for any T, an J- -global rounding exists. 
On the other hand, for any constant e > 0, there exists an input a which has no 
rounding with a discrepancy less than 1 — e even if we consider the family of all 
intervals of length 2 

There are two classical algorithms each of which computes an iF-global round- 
ing (the output sequence depends on the algorithm): One is the error-diffusion 
algorithm, and the other is Viterbi’s decoding algorithm (outlines are given in 
the appendix) . Moreover, Asano et al. ^ have recently shown that for any given 
input sequence a, a binary sequence b minimizing the discrepancy can be com- 
puted in time 0{y/n\J-\log^ n), where \T\ is the cardinality of iF, and hence 
O(n^). 

A major defect of the above algorithms is that each of them outputs only 
one particular .F-global rounding. This lack of flexibility causes some serious 
problems in some applications such as image processing. Therefore, it is desired 
to design efficient algorithms to output either (1) all iF-global roundings or (2) 
a system so that one can efficiently select a given number of J^-global roundings 
uniformly random from the set of all global roundings. 

In this paper, we consider the family consisting of all intervals of length 
at most fc in [l,n]. The family is natural and important in several applications. 
We first consider the special case where k = n, and show that we can report all 
I„-global roundings in 0{n^) time. This implies that the number of global 
roundings is bounded by a polynomial; indeed, it is at most n -|- 1, and exactly 
n-|- 1 under a non-degeneracy condition. Next, we give an 0{nk) time algorithm 
to output an acyclic network with 0{nk) nodes so that the set of all -global 
roundings equals the set of all directed s-t paths in the network. As byproducts, 
we show that several optimization rounding problems that can be solved in 
0{2^qn) time by using Viterbi’s dynamic programming algorithm can be solved 
in 0{kqn) if we restrict the solution space to the set of global roundings. Here, 
q is the time to do some basic operations depending on problems. This includes 
an improved 0{nk) time complexity of computing the rounding b minimizing 
Dist^{a,b). 

The present paper mainly focuses on theoretical aspect of the problem; how- 
ever, our motivation comes from digital halftoning, which is one of the most 
fundamental techniques in image processing. An intensity image can be con- 
sidered as a [0, l]-valued n x n array A where each entry Oij corresponds to 
a brightness level (gray level) of the {i,j) pixel of the pixel grid. For a color 
image, we consider an overlay of three such matrices representing red, green, 
and blue color components, respectively. The digital halftoning is to compute 
a binary n x n array B “approximating” A. The intention of this method is 
to convert a given image which consists of several bits for brightness levels into 
a binary image having only black and white pixels. This kind of technique is 
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indispensable to print an image on an output device that produces black dots 
only, such as facsimiles and laser printers. The problem is not easy; for example, 
neither simple rounding nor randomized rounding ( round each entry aij to 1 
with probability aij) generates a good halftoning image. 

Up to now, a large number of methods and algorithms for digital halftoning 
have been proposed (see, e.g., |8I4I9I10| '1. The ordered dither method m and the 
two-dimensional error diffusion method ^ are quite popular methods. By the 
nature of the problem, we need help of human’s decision to judge the quality of 
halftoning; however, a nice mathematical measurement for automatically evalu- 
ating the quality is desired. Discrepancy is a nice mathemetical measurement for 
the halftoning mi- However, two dimensional rounding problem minimizing the 
discrepancy is NP-hard, and even its approximation is theoretically difficult P] 

El 



The concept and algorithms for global roundings given in this paper will 
be useful tools for designing nice halftoning methods. Every Ifc-global rounding 
(for a suitable k) gives a good quality rounding for each row. However, if we 
further consider the side-effect, it is not wise to round each row independently 
and combine them, since it often causes some systematic patterns (that do not 
exist in the input image) in the output image: Such a pattern is called a regular 
pattern created by a rounding. 

We can avoid generating regular patterns if we have many candidate global 
roundings for each row and select a suitable one considering the relation to the 
neighbor rows. Even a random choice of a global rounding works well in our 
preliminary experiments: Compared to the randomized rounding, the method 
to choose a global rounding randomly in each row decreases the randomness, 
and hence tends to keep features of the original image better. Moreover, we can 
consider several bicriteria optimization problems to compute global a rounding 
of each row that simultaneously minimizes two-dimensional side effects. 



2 Structure of the Set of Global Roundings 

2.1 Preliminaries 

Let 5(0, T) be the set of all iF-global roundings of a, and let lV(a, T) = \S{a, T)\ 
be the number of different roundings. The discrepancy satisfies the monotonicity 
by definition; i.e., Dist^{a,b) > Dist‘^{a,b) \i T ZP J . Therefore, S{a,tF) C 
S{a,J) HTDJ. 

For a sequence c of length n, let c(< k) be its prefix of length k. Thus, a(< k) 
is the prefix of the input sequence a of length k. Abusing the notation, we say 
that a binary sequence of length A: is a .7^-global rounding of a prefix of a if it is 
a global rounding of a{< k) with respect to tF{< fc) = {/ fl [1, k] : I G T}. The 
following lemma is trivial, but useful: 

Lemma 1. The prefix of length k of a T-global rounding b of a is a J- -global 
rounding of the prefix a(< k) of a. Moreover, for every T-global rounding c of 
a prefix a(< k), its prefix of length £< k is a T-global rounding of the prefix 
a{< £). 
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Definition 1. A family T is called prefix- complete if for any m < n and for 
any / G iF, / fl [1, to] G iF. 

We mainly consider prefix-complete families in this paper. Obviously, Xk, 
which we focus on, is a prefix-complete family. 

2.2 Rounding Graph 

Definition 2. A rounding graph of a with respect to T is a directed acyclic 
graph G with a source node such that each edge contains either 0 or 1 as a label, 
every path from its source to a sink gives a global rounding (if we read the labels 
at edges on the path sequentially) of a, and every global rounding appears as such 
a path. 

There may be several different rounding graphs for a set of global roundings. 
We first consider one particular rounding graph (indeed, a binary tree) of an 
input sequence a with respect to a prefix-complete family T of intervals. The 
graph is often called the keyword tree in the literature 0 , if we consider the set 
of global roundings as a set of binary keywords. See Figure Q for an example. 

The construction is as follows: We denote 6*0 and b • 1 as the sequence 
obtained by appending 0 and 1 to the end of b, respectively. We consider a 
node v{c) associated with an integral sequence c, and let V{a,J^) = {t'(c) : 
c is a .F-global rounding of a prefix of a}. Here, we use a convention that 0 is a 
global rounding of the empty “prefix” of length 0 of a. Consider a graph T(a, iF), 
which has V{a,iF) as its node set, and has an arc from v{c) to v{d) if and only if 
either d — c*0 or d — c»l: the arc has 0 (resp. 1) as its label in the former (resp. 
latter) case. The following lemma is immediately obtained from the construction 
and the definition of a prefix-complete family: 

Lemma 2. T{a,T) is a binary directed tree rooted at f(0) such that if we read 
the labels at edges on the path from u(0) to a node v{c) sequentially, we have the 
binary string c. 

The depth of the tree T{a,T) is n by the construction, and we ignore the 
leaves at shallower levels, if any. In precise, let T(a,iF) be the induced subgraph 
of T{a,T) consisting of nodes on the paths from leaves of level n towards the 
root. T(a, T) is a rounding graph, since the set of paths from the root to leaves 
of depth n is exactly the set of iF-global roundings. Note that the size of the tree 
may be exponential in general. 

3 X„-Global Roundings 

3.1 Combinatorial Results 

We consider the case where iF = If N{a,T) is very large (say, exponential 
in n), we have no hope to report all the .F-global roundings in polynomial time. 
The following lemma is easy to prove, but it was a surprising discovery for the 
authors: 
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a 0.4 0.4 0.4 0.4 




Fig. 1. The rounding graph T{a,T), where F = and a — (0.4, 0.4, 0.4, 0.4). 



Lemma 3. For any real sequence a of length n, 7V(a,I„) < n + 1. 

Proof. We prove the lemma by induction on n. If n = 1, the lemma is 
trivial. Suppose that the statement holds for each sequence of length less than 
n. For each rounding b G S'(a,I„), we can observe that b{< n — 1) G S{a{< 
n — A pair of binary sequences b and b' is called a prefix- sharing pair 

if b{< n — 1) = b'{< n — 1). We claim that there is at most one prefix-sharing 
pair in S'(a,I„). 

Assume that the claim is false. Thus, we have b, b' , c, c' G S(a,In) such that 
b(< n—l) = b'{< n— 1), c(< n— 1) = c'(< n— 1), and b(< n— 1) yf c(< n— 1). 
We can assume that the last entries of b and c are 1, while those of b' and c' 
are 0 entries. Since b{< n — 1) yf c(< n — 1), there exists an interval [j,n — 1] 
such that b{[j, n—1]) yf c{[j,n—l]). From the definition of the global rounding, 
l^(b; 1]) ~ c([j, n — 1]) I = 1 and without loss of generality, we can assume that 

b{[j,n — 1]) = c{[j,n — 1]) -I- 1. Thus, b{[j,n]) — c'{[j,n)\ = 2; however, because 
of the definition of a global rounding, b([j,n]) < a{[j,n]) 1 and c'([j, n]) > 

a{[j,n]) — 1 and hence b{[j,n]) < c'{[j,n]) 2. This gives a contradiction. 

From this claim, we have N{a,In) < N{a{< n — -I- 1 < {(n 

1} -|- 1 = n -k 1, and the lemma is proved. □ 

Definition 3. A real sequence a is called non- degenerate if a{I) is non-integral 
for every interval I & In- 



Lemma 4. T(a,I„) = T(a,I„), and if the sequence a is non-degenerate, 
N{a, In) = n -k 1. 

Proof. First, we show that for any k < n — 1 and any sequence b G S{a{< 
k),Ik), either b*0orb*lisa member of S{a{< k -k l),Ifc+i)- This implies 
that there is no leaf in T(a,I„) in a level with depth k < n—1, and hence 
T{a, In) = f{a,In). 

Assume that there exists b G S{a{< k),Ik) such that neither 5*0 nor 
b • 1 is a member of S{a{< k -k l),Ik+i)- Thus, there exist indices i < k and 
j < k such that b{[i, fc]) -k 1 > a{[i, fc -k 1]) -k 1 and b([j, fc]) < a{[j, fc -k 1]) — 1. 
Therefore, if z < j, we have b([z,j — 1]) > 1 -ka([z, j — 1]), and otherwise, we have 
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b{[j,i — 1]) < — 1 -I- a{[j,i — 1]). This is a contradiction, since \a{I) — b{I)\ < 1 
for every interval I. 

Next, we show that if a is non-degenerate, there always exists b G S(a(< 
k),Ik) such that both of b*0 and b* 1 are members of S{a{< k+ l),Ifc+i). For 
the purpose, we use a variant of the error diffusion algorithm (see Appendix) 
processed in a reverse order starting from k to compute a sequence b such that 
0 > b{[j, A:]) — a([j, A: -I- 1]) > —1 for every j = fc — 1, fc — 2, . . . , 2, 1. It is 
not difficult to see that there always exists such a sequence. Because of our 
assumption that a{I) is not integral, this implies that both of b • 0 and b • 1 are 
in S'(a(< k + l),Ifc+i). Thus, \S{a{< k + l),Ifc+i)| > |S'(a(< k),Ik)\ + 1, and 
we have N{a,I„) > n+ 1. Combined with the previous lemma, the inequality 
must be an equality. □ 

These two lemmas imply that, if we apply a symbolic perturbation method 
to modify the input sequence a such that a{I) is non-integral for every I, we 
can always have exactly n -I- 1 global roundings of a with respect to . 

One natural question is whether we can obtain a polynomial bound of the 
number of binary sequences if we relax the discrepancy bound. The answer is 
negative: suppose that we consider the relaxed condition Dist^{a,b) < 1, in- 
stead of Dist^{a,b) < 1. Consider the input sequence a of even length whose 
every entry is 0.5. Then, we can observe that every binary sequences satisfying 
that b 2 i-i + b 2 i = 1 for i = 1,2, . . . n/2 are included in the solution set. There 
are 2”/^ such sequences. 

3.2 Algorithm for Reporting all X„-Global Roundings 

For the family X„ of all intervals, we compute all n -I- 1 sequences. We indeed 
construct the rounding graph T = T(a,X„) in O(n^) time and 0{n) working 
space (ignoring the space to store the tree). The tree T is a binary tree of height 
n with at most n -I- 1 leaves, and it has 0(n^) nodes. 

For simplicity, we simply call a global rounding for an X„-global round- 
ing in this subsection. For each global rounding c of a prefix (say, a(< i)) 
of a, let dijf{c) = a{[l,i]) — c{[l,i]). We define maxdiff{c) = max{dzj(f(d) : 
d is a prefix of c} and mindijf{c) = mm{diff (d) : d is a prefix of c}. 

Starting from 0, we construct the tree from top to bottom, increasing the 
depth one by one. The level which is under construction in the algorithm is 
called the current level. If the current level has a depth i, we construct nodes 
corresponding to global roundings of a(< i). We compute diff{c), maxdiff{c), 
and mindijf{c) for the nodes in the current level of the tree by using the infor- 
mation of the previous level. Note that maxdijf{c) < mindiff{c) + 2 holds. 

Suppose that the current level is at depth i, and let v{c) be a node of T with 
depth i — 1 (the level with depth i — 1 has been already constructed) . We want 
to decide whether c*0 and/or c*l are global roundings of a(< i). The following 
result is obtained in a routine way from the definition of a global rounding: 

Lemma 5. Let c be either c*0 or c* 1. The sequence c is a global rounding of 
o(< i) if and only if maxdiff{c) — 1 < diff{c) < mindiff{c) + 1 ■ 
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Since diff{c» 0) = diff{c) — a{i) and diff{c» 1) = diff{c) + 1 — a{i), they 
can be computed in 0(1) time. Thus, we can decide in 0(1) time whether c is 
a global rounding or not. it is easy to see that maxdiff {£) and mindiff{c) can 
be computed in 0(1) time. Hence, we spend 0(1) time to creating a node in the 
graph T. Thus, the time complexity of our algorithm is 0{n^). Since we only 
use the information stored in the (i — l)-th revel to compute the i-th level, we 
use 0 (n) working space (ignoring the space to store the output). 

3.3 Compact Rounding Graph for a Smaller Family of Intervals 

In some applications, we do not care very long intervals. Hence, instead of 
we would like to consider Ik for k < n. Unfortunately, the number of Ifc -global 
roundings is f2{{k+ 1) L"/ 2 U ) ^ and hence exponential in n/2k. Therefore, it is too 
expensive to report all the I^-global roundings explicitly. Instead, we construct 
a rounding graph of size 0{nk), so that we can generate global roundings in a 
uniformly random fashion. 

Let us learn from the following simple example: Consider a fixed input a = 
(0.4, 0.4, . . . , 0.4) consisting of n entries with a value 0.4. A binary string is an I 2 - 
global rounding of a if and only if it contains no two consecutive entries 1,1. Such 
binary sequences correspond to vertices of Fibonacci cube the number of 

such sequences equals the (n -|- 2)-th Fibonacci number; Hence it is exponential. 
However, we have a compact rounding graph with 2n -I- 1 nodes illustrated in 
the left drawing of the Figure El If we consider I 3 , we have a rounding graph in 
the right drawing. 




► Edge corresponding to 0 

► Edge corresponding to 1 




Edge corresponding to 1 



Fig. 2. Rounding graphs for I 2 (left drawing) and I 3 (right drawing). 



Theorem 1. For any input sequence a, we can construct its rounding graph 
with at most nk + 1 — [k(k + l)/2] nodes representing the set of all Ik-global 
roundings. 

The rest of this subsection is devoted to the proof of the above theorem. The 
proof is constructive, and similar to the construction of a HDD (bounded decision 
diagram) from a decision tree. First, we consider the tree T = T{a,Ik) defined 
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in the previous section. We say two sequences c and c' are (fc— l)-similar to each 
other if they have the same length £ > /c — 1, and they have the same sufhx of 
length k — 1 . The equivalence class of a sequence c under the {k — l)-similarlity 
is denoted by classic). In this subsection, we concentrate on the family Ik, and 
hence simply write “global roundings” for I^-global roundings. 

Two nodes v(c) and v(c') in T are called similar to each other if c and c' are 
(k — l)-similar. The following claim is easy to verify: 

Claim A: If v and v' in T are similar, there is an one-to-one matching 
between the set of descendants of v and that of v' such that each matching 
nodes are similar to each other. 

We fold the tree T to obtain a graph G{a,Ik) such that similar nodes are 
identified and unified into a single node of G{a,Xk). The edges of T is also 
unified without causing conflict because of Claim A. Inherited fom T, The graph 
G{a,Xk) is a layered directed acyclic graph with n+ 1 layers. From the definition 
of similarity, the unified edges should have the same label. Due to Claim A, all 
the outgoing edge with a same label must be unified; thus, each node has at 
most two outgoing edges. Also, each edge has a label 0 or 1 inherited from T 
without causing any conflict. 

From Lemma 13. IL there are at most k different binary sequences which is 
a global rounding of a subsequence ai,ai+\, ...,ai+k -2 with respect to Xk-i- 
Hence, at each layer of T, there are at most k different suffixes of the sequences 
associated to node in the layer. Hence, there are at most k nodes in each layer 
of G. We can also easily see that the first Fth layer has at most i + 1 nodes for 
i <k — 1 . This proves the theorem. 

3.4 Algorithm to Compute a Compact Rounding Graph 

We want to compute G{a,Xk) efficiently. Since, Xk is prefix complete, we can 
apply a similar sweeping strategy to the case of . 

Each node of G{a,Xk) corresponds to an equivalence class of a prefix of a, 
and wrote as v{c), where c is the representative of the equivalence class, which 
is the lexicographically smallest member (in other words, the smallest member 
if we regard binary sequences as integers in binary forms) in the class. 

Starting from 0, we construct G{a,Xk) from the source to sinks, increasing 
the level (i.e., depth) one by one. If the current level has depth i, we construct 
vertices corresponding equivalence classes of the global roundings of a(< i). As 
we have shown in the previous subsection, there are at most k such equivalence 
classes. We maintain diff{c), maxdiffk{c), and mindijfk{c) for the representative 
c of the equivalence class corresponding to each node in the current level of the 
graph by using the information of the previous level. Let L{m) be the set of 
representatives of the equivalence classes corresponding to nodes of the m-th 
level of G{a,Xk). 

Let £(c) be the length of a sequence c. We define maxdiffk{c) = max{diff{d) : 
d is a prefix of c such that £{d) > £{c) — fc -|- 1 } and mindiffk{c) = m.m{dijf 
(d) : d is a prefix of c such that £{d) > £{c) — fc -|- 1 }. 
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Lemma 6. If c = (ci, C 2 , . . • , Cm) is a prefix of a global rounding with respect 
to Ik, c* Cm+i (cm+i = 1 or 0) is a prefix of a global rounding if and only if 
maxdiffk-i (c) + Om+i - 1 < c„+j < mindiffk-i (c) + Om+i + 1 

Hence, we can select all the global roundings among {c • 0 : c G L{m)} and 
{c* 1 : c G L{m)} in 0{k) time. Thus, we can construct G{a,Ik) in 0{nk + nq) 
time if the following operations can be done in 0{q) amortized time for each 
level: (1): Classify the set of global roundings among {c* 0 : c G L{m)} U {c* 1 : 
c G L{m)} into equivalence classes, and choose representatives. (2): Compute 
information of diff, mindiffk and maxdiffk for all representatives in L{m+ 1). 

In order to implement the operation (1), we consider a tree T(m) from the 
set of representatives c in L{m). The tree has a leaf l{c) for each c G L{m), 
and each edge has either 0 or 1 as its label, and the path from the root to l{c) 
gives the suffix of length fc — 1 of c in the reverse order. For example, if A: = 4 
and c = 0, 0, 1, 1, 0, 1, 1, the path from the root gives the sequence 1, 1, 0. It is 
clear that T{m) has O(fc^) edges. From T{m), we can construct T{m + 1) by 
making two copies of T(m), joining them at a new root with edges of labels 0 
and 1 respectively, remove leaves which do not correspond to global roundings, 
and upgrades each other leaf to its parent’s place. If two leaves are upgraded to 
the same position (i.e., if they have the same parent), we know that these two 
leaves are corresponding to sequences with a same equivalence class. 

In order to attain the 0{k) time complexity, we use a compressed form H(m) 
of T{m). Since T{m) has only k leaves, it has at most k—1 branching nodes. The 
vertex set of H{m) consists of the root, leaves, and branching points of T{m). 
We unite each path between consecutive branching points in T(m) to have an 
edge of H{m). A label sequence associated with a path in T{m) associated with 
an edge in H{m) is stored into a cell with 0{k) space. Each edge of H{m) has 
a pointer to the cell containing the label sequence associated with the path in 
T{m). Instead of updating T{m), we update H{m) into H{m + I). The copying 
and modifying the structure of H(m) into H(m + I) can be done in 0{k) time. 
We create two cells associated with edges adjacent to the new root. Only at 
most 0{k) cells storing label sequences are updated, and an update of the label 
sequences is either removing the last bit of the sequence, or appending sequences 
in two cells; Hence, each such operation can be done in 0(1) time. Thus, we can 
do the operation (1) in 0{k) time. 

The operation (2) can be implemented in 0(fclog k) time by using a dynamic 
tree data structure. Instead, we do it in 0{k) amortized time without using 
a complicated data structure. We say a level m a major-event level if m is a 
multiple of k. Other levels are called minor-event levels. At each major-event 
level, we construct the history of the past k levels used in the following minor- 
event levels. In precise, consider a major-event level where m = jk. For the 
representative c of each node in the current level, we consider its prefixes c(< i) 
for (j — l)k < i < jk, and compute intmin{c[s , j k]) = mins<i<jfe diff{c{< z)) 
and intmax{c[s, jk]) = maxs<i<jfe diff{c{< z)) for each (j — l)k < s < jk. This 
computation can be done from right to left in 0{k) time for each c, and hence 
0{k^) time for each major-event level. 
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At a minor-event level L(m), if jk is the previous major-event level, 
we compute localmin{c) = diff{c{< i)) and localmax{c) = 

TLnaxjk<i<m diff {c{< i)). Since localmin{c) = min{/ocaZmin(c(< m — 

l)),diff{c)} (analogous formula holds for localmax), they can be com- 
puted in 0(1) time for each c. We can observe that mindiffk{c) = 
\ni\i{localmin{c) , intmin{c[m — k + l^jk])}, and we can use the same intmin 
value for the ancestor of c at the previous major-event level. Analogous formula 
holds for maxdiffk{c). Hence, the computation at a minor level is 0{k). Thus, 
the amortized time complexity per a level is 0(fc). Hence, we have obtained the 
following theorem: 

Theorem 2. The graph G{a,Ik) can he constructed in 0{nk) time using O(fc^) 
working space. 

We can compute for every node v{c) of G {a, 1^) the number n(v(c)) of global 
roundings of a that have c as their prefix. This can be done in 0(nk) time by 
using a dynamic programming procedure. By using this information, we can 
generate global roundings uniformly random by walking on the directed acyclic 
graph G{a,Ik) (directed from the source to sinks) using n(v(c)) as the proba- 
bility for choosing the next branch (i.e., next bit of the rounding). 

4 Fast Viterbi-Type Algorithms and Bicriteria 
Optimization 

Let us review the Viterbi’s algorithm (see Appendix) in a general form. For each 
integral subinterval J = [i+l,i + k] C [l,n] oflength k, let us consider a function 
fj assigning a real value fj{a,x) for each pair of a real sequence a G [0,1]" and 
a binary sequence x G {0, 1}" of length n. The function fj is called local if 
fj{a,x) is determined by the entries of a and x located in the interval J. 

Consider a commutative semigroup operation © satisfying the monotonicity, 
i.e., if xi > yi and x^ > j /2 then X\(BX 2 > 2/i © 2 / 2 - Examples of such operations 
are +, max, min, and taking the Lp norm (|a;i|^ + \x 2 \^)^^^. Let us consider the 
sum (under the © operation) F{a, x) = ®”Tq^ /[i+i^i_i_fc](a, x), and would like to 
find a binary sequence x minimizing F(a, x). 

Viterbi’s dynamic programming algorithm can be applied to the above prob- 
lem. It is easy to see the following: Suppose that fj{a, x) is local and computable 
in 0{q) amortized time if we run the dynamic programming. Then, the binary 
sequence x minimizing F{a, x) can be computed in 0{2^nq) time. If we further 
combine our global rounding condition, we have the following: 

Theorem 3. Under the assumption as above, the global rounding sequence x of 
a with respect to Ik minimizing F{a,x) can he computed in 0{knq) time. 

Proof. We need to keep fc + 1 binary sequences instead of 2^ sequences in the 
dynamic programming, because G{a,Xk) has at most k+1 nodes in a level. □ 
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Corollary 1. The rounding minimizing the Lao rounding error with respect to 
Ik can be computed in 0{kn) time. 

Proof. We set /[i+i_i_|_fc](a, a;) to be the maximum of the absolute difference 
between a{[i + s,i + k]) and x{[i + s, i + k]) over s = 1, 2, . . . , fc. It is easy to see 
that fj{a,x) can be computed in 0(1) amortized time by using data structures 
given in previous sections. □ 

For a family of interval T, we can consider a nonnegative valued function 
w on T and define the weighted Ip distance Dist^''" {a,b) = - 

b{I)\Pw{I)Y/’P between a and its rounding b. Although a weighted Ip distance is 
a nice measure of quality of a rounding if we choose suitable w and p, it is time 
consuming to compute the optimal rounding with respect to this measure | 2 |. 
However, if we restrict the solution space to the set of global roundings with 
respect to Ik, we have the following: 

Corollary 2. Given any weight function w, the global rounding minimizing the 
weighted Ip error with respect to Ik can be computed in 0{k‘^n) time. 

5 Remarks on Digital Halftoning Applications 

From the viewpoint of practical applications, our main target is digital halfton- 
ing: We would like to approximate a [0, l]-valued matrix A with a binary matrix 
B. One natural formulation is that we define Dist^{A, B) = max/jgjF |A(i?) — 
B{R) I for a family F of subarrays, and find B minimizing this distance. However, 
this problem is NP-hard, and even an approximation algorithm with a provable 
constant approximation ratio is difficult to design One heuristics method 
is to round rows one by one, considering the relations to roundings of previous 
rows. Here, we must keep the rounding of the current row to be similar to the in- 
put sequence (the global rounding property certifies it) to reduce the side-effect 
of roundings of forthcoming rows, and also minimize the two-dimensional error 
effect in the part of the matrix rounded so far (together with the current row). 
For the purpose, the bicriteria method given in the preceding section will be 
suitable. Our experimental results will be reported elsewhere. 
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Appendix: Algorithms for Computing a Global Rounding 

Error diffusion algorithm. Let a = (oi, U 2 , . . . , a„) be our input sequence 
such that 0 < Oj < 1 for all j S {l,2,...,n}. The error diffusion algorithm 
computes a binary sequence b from b\ to bn greedily in an incremental fashion 
in linear time. We always keep the difference Sj = ~ bi) if we have 

already computed bi through bj, and determine 6^+1 to be 1 if Sj + aj+i > 0.5 
and to be 0 otherwise. It can be easily seen that —0.5 < Sj < 0.5 always holds, 
and hence for any interval I, \ ~ bi)\ < 0.5 — (—0.5) = 1. 

Viterbi’s decoding algorithm is a dynamic programming algorithm that com- 
putes a rounding 6 of a sequence a minimizing Dist^{a, b) for a given T 

For each binary pattern P of length k, the algorithm computes real num- 
bers rrio{P,i) and m{P,i) for i = k,k + 1, . . . ,n. The number mo{P,i) is the 
discrepancy (with respect to P) between P and the subsequence of a consisting 
of entries from ai-k+i to a^. 

We compute m{P,i) for all patterns P and all fc < i < n by a dynamic 
programming procedure: As initialization, we consider the first k entries of a, 
and set m{P, k) = mQ{P, k). Then, we sweep the sequence from left to right to 
update the rounding error by 

m{P,i) = max{mo(P, i), min{m(P+, f — l),m{P~,i — 1)}}. Here, and P~ 
are the patterns by removing the last (i.e., rightmost) bit and appending 1 and 
0 to the left of P, respectively. It can be seen that mmp{m{P,n)} attains the 
minimum of Dist^{a,b) over all binary sequences, and the sequence b can be 
computed by backtracking the dynamic programming process. The time com- 
plexity of this algoirthm is 0{2^{\P\ + n)). 

We remark that the original Viterbi’s decoding algorithm deals with the Li 
measure instead of the Loo measure. 

Network type algorithm. Asano et al.m applied the negative cycle detection 
algorithm on a network to devise a polynomial time algorithm to compute the 
rounding sequence b minimizing Dist^{a,b) in 0(min{A:^nlogn, log^ n}) 
time, where k is the maximum length of the intervals of T . 
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Abstract. The model of bulk-synchronous parallel (BSP) computation 
is an emerging paradigm of general-purpose parallel computing. We pro- 
pose a new p-processor BSP algorithm for the all-pairs shortest paths 
problem in a weighted directed dense graph. In contrast with the gen- 
eral algebraic path algorithm, which performs to 0(p^^®) global 

synchronisation steps, our new algorithm only requires O(logp) synchro- 
nisation steps. 



1 Introduction 

The model of bulk-synchronous parallel (BSP) computation (see piSIl II Id] ) pro- 
vides a simple and practical framework for general-purpose parallel computing. 
Its main goal is to support the creation of architecture-independent and scalable 
parallel software. Key features of BSP are its treatment of the communication 
medium as an abstract fully connected network, and strict separation of all inter- 
action between processors into point-to-point asynchronous data communication 
and barrier synchronisation. This separation allows an explicit and independent 
cost analysis of local computation, communication and synchronisation. 

In this paper we propose a new BSP algorithm for the all-pairs shortest paths 
problem in a weighted directed dense graph. This problem is a special case of 
the general algebraic path problem, therefore it is natural to compare the gen- 
eral algebraic path algorithm with our new all-pairs shortest paths algorithm. 
Similarly to the general algorithm, the new algorithm is efficient in local com- 
putation, and exhibits a tradeoff between communication and synchronisation; 
however, our algorithm requires significantly fewer global synchronisation steps. 



2 The BSP Model 

A BSP computer, introduced in UBI, consists of p processors connected by a 
communication network. Each processor has a fast local memory. The processors 
may follow different threads of computation. A BSP computation is a sequence 
of supersteps. A superstep consists of an input phase, a local computation phase 
and an output phase. In the input phase, a processor receives data that were sent 
to it in the previous superstep; in the output phase, it can send data to other 
processors, to be received in the next superstep. The processors are synchronised 
between supersteps. The computation within a superstep is asynchronous. 
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Let cost unit be the cost of performing a basic arithmetic operation or a 
local memory access. If, for a particular superstep, w is the maximum number 
of local operations performed by each processor, h' (respectively, h") is the 
maximum number of data units received (respectively, sent) by each processor, 
and h = h' + h” (another possible definition is ft. = max{h' , h”)), then the 
cost of the superstep is defined as w + h ■ g + 1. Here g and I are the BSP 
parameters of the computer. The value g is the communication throughput ratio 
(also called “bandwidth inefficiency” or “gap”), the value I is the communication 
latency (also called “synchronisation periodicity”). If a computation consists 
of S supersteps with costs Ws + hg ■ g + L 1 < s < S', then its total cost is 
W+H-g+S-l, where W = Wg is the local computation cost, H = X^s=i 
is the communication cost, and S is the synchronisation cost. The values of W, 
H and S typically depend on the number of processors p and on the problem 
size. 

Papers mm present the McColl-Valiant BSP algorithm for standard (non- 
Strassen) matrix multiplication. The local computation, communication and syn- 
chronisation costs of this algorithm are 

W = O(n^p) H = Oin^jp^!'^) S = 0(1) 

Paper extends this result to fast (Strassen-type) matrix multiplication. The 
local computation, communication and synchronisation costs of the extended 
algorithm are 



W = 0{n^/p) H = Oin^jp^!'^) S = 0(1) 

where w is the exponent of fast matrix multiplication (currently 2.376 by P)). 

Many BSP algorithms are only defined for input sizes that are sufficiently 
large with respect to the number of processors. This requirement is loosely re- 
ferred to as slackness. The algorithm presented is this paper needs a very moder- 
ate amount of slackness: to compute all-pairs shortest paths in an n-node graph, 
we must have n > p. 

For the sake of simplicity, we ignore small irregularities that arise from im- 
perfect matching of integer parameters. For example, when we write “divide an 
array of size n equally across p processors” , the value n may not be an exact 
multiple of p, and therefore the shares may differ in size by ±1. We use square 
bracket notation for matrices, referring to an element of an n x n matrix A as 
j], 1 < ft j < n. 



3 Algebraic Path Computation 

In this section we consider the problem of finding the closure of a square matrix 
over a semiring. This problem is also known as the algebraic path problem. It 
unifies many seemingly unrelated computational problems, such as graph con- 
nectivity, network reliability, regular language generation, network capacity. All 
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these tasks can be viewed as instances of the algebraic path problem for an ap- 
propriately chosen semiring. More information on applications of the algebraic 
path problem can be found in 






Let an n X n matrix A over a semiring represent a weighted graph with nodes 
1, . . . ,n. The length of an edge i — >■ j is defined as the semiring element A[i,j], If 
the graph is not complete, we assume that non-edges have length zero. We denote 
semiring addition and multiplication by 0 and 0 respectively. When it does not 
create confusion, we also denote semiring multiplication by juxtaposition (e.g. 
ab for 0 0 6), and use standard notation for semiring powers (e.g. o^ for a Q a). 

Let A* = 7 0 A 0 0 • • • be the closure of matrix A (it is not guaranteed 

to exist in a general semiring). The distance between nodes i, j is defined as 
the semiring element A*[i^j]. Note that in this general setting, the distance does 
not have to correspond to any particular “shortest” path in the graph. In the 
special case where the semiring is the set of all nonnegative real numbers with 
oo, and the operations min and + are used as 0 and 0 respectively, lengths and 
distances have their standard graph-theoretic meaning — in particular, oo plays 
the role of the semiring zero, and the distances are realised by shortest paths. 
We will return to this special case in Section 0J 



In order to compute the closure of a square matrix over a general semiring, we 
use Gaussian elimination without pivoting. In the absence of pivoting, Gaussian 
elimination over a general semiring is not guaranteed to terminate. Guaranteed 
termination can be achieved by restricting the domain (e.g. considering closed 
semirings instead of arbitrary semirings) , or by restricting the type of the matrix 
(e.g. considering numerical matrices with certain special properties). In the case 
of numerical matrices, computation of the matrix closure corresponds to matrix 
inversion: A* = [I — A)~^. 

Let A be an n X n matrix over a semiring. We assume that the closure of a 
semiring element can be computed in time 0(1), whenever this closure exists. 
Matrix closure A* can be computed by sequential Gaussian elimination in time 
O(n^), provided that the computation terminates. This method is asymptotically 
optimal for matrices over a general semiring, which can be shown by a standard 
reduction of the matrix multiplication problem. 



The parallel complexity of Gaussian elimination has been extensively studied 
in many models of parallel computation. A BSP algorithm in El works by 
reducing the problem to the computation of a three-dimensional cube dag (see 
im, [El; many similar algorithms have been proposed earlier in the context of 
systolic computation). The BSP cost of the cube dag algorithm is W = 0(n^/p), 
H = 0{nyp^/^), S=0{p^^^). 

A lower communication cost for computing matrix closure can be achieved by 
recursive block Gauss-Jordan elimination. This standard method was suggested 
in Ets a means of reducing the communication cost of a parallel transitive 
closure algorithm, which is another special case of matrix closure. The BSP 
cost of block Gauss-Jordan elimination was analysed in El; we summarise the 
results here for completeness. 
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For convenience we assume that the resulting matrix A* must replace the 
original matrix A. The algorithm works by dividing the matrix into square blocks 
of size n/ 2 , 



and then applying block Gauss-Jordan elimination: 

All A*i 

Ai2 ^ A11A12 
A21 ^ A21AII 
A22 " i — A22 © ^21^11^12 

after which every A^j overwrites Aij. The procedure can be applied recursively 
to find Ah and Ah- The resulting matrix is 

A * — ( ^11 ® ^11^12 © G* © A 2 iAh © G *\ , - 

G* © G* ) 

where G = A22 © A2iAhAi2 (here we use both © and juxtaposition to denote 
semiring multiplication). The computation terminates, if all taken closures exist. 

The resulting BSP algorithm allows us to trade off the costs of communication 
and synchronisation in a certain range. In order to account for this tradeoff, we 
introduce a real parameter a. The algorithm is as follows. 

Algorithm 1. Algebraic path computation. 

Parameters: integer n > p; real number a, amin = l/2<a<2/3 = Omax- 
Input: n X n matrix A over a semiring. 

Output: n X n matrix closure A* (assuming it exists), overwriting A. 
Description. The computation is defined by recursion on the size of the matrix. 
For small blocks, H2D is computed sequentially on an arbitrarily chosen processor. 
For large blocks, matrix multiplication in (0 is performed by the McColl- Valiant 
algorithm on all p processors. The details of the algorithm are described in El 
Cost analysis. The analysis in El gives 

w = 0(n^/p) H = 0 (j?Ip^) S = 0(p“) ■ 

For a = Omin = 1/2, the cost of Algorithm [D is VF = 0{n^/p), H = 
0{ri} jp^^hi ^ = O(p^G), xhis is asymptotically equal to the BSP cost of the 
cube dag method from El- For a = Omax = 2/3, the cost of Algorithm [Qis 
W = 0(n^/p), H — 0(r? jp^^h^ ^ — 0{p‘^^^). In this case, the communication 
cost is as low as in matrix multiplication (the McColl- Valiant algorithm). This 
improvement in communication efficiency is offset by a reduction in synchroni- 
sation efficiency. For large n, the communication cost of Algorithm ^ dominates 
the synchronisation cost, and therefore the communication improvement should 



A22 — ^22 

A21 ^ ^22^21 

^12 ^ ^12^22 

All ^ ^11 © ^21^22 Ai 2 



(2) 
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outweigh the loss of synchronisation efficiency. This justifies the use of Algo- 
rithm ^ with a = Omax = 2/3. Smaller values of a, or the cube dag algorithm, 
should be considered when the problem is moderately sized. 

If the ground semiring is a commutative ring with unit, fast matrix multi- 
plication can be used instead of standard matrix multiplication for computing 
block products. The BSP cost of the resulting algorithm is 

W = Oin^/v) H = 0{n^/p^) S = 0{p°^) 
where l/(a; — 1) < a < 2jLo. 



4 All-Pairs Shortest Paths Computation 

4.1 Nonnegative Edge Lengths 

In SectionElwe considered the algebraic path problem over an arbitrary semiring. 
Here we deal with a special case where the semiring is the set of real numbers 
with oo, and the numerical operations min and -I- are used as semiring addition 
© and multiplication 0 respectively. Since the min operation is idempotent, for 
all i, j there is a path from i to j of length A*[i,j] — this is one of the shortest 
paths from i to j. Most algorithms for matrix closure in the (min, +) semiring can 
be extended to compute the shortest paths between all pairs of nodes, as well as 
the distances. Therefore, in this section we use the term all pairs shortest paths 
prohlem as a synonym for the matrix closure problem in the (min,+) semiring. 
Initially, we consider the case where all edge lengths are nonnegative. We then 
extend our method to general lengths. 

The technique of Gauss-Jordan elimination, considered in Section 0 can be 
applied to the all pairs shortest paths problem. In this context, Gauss-Jordan 
elimination is commonly known as the Floyd- War shall algorithm (see e.g. |3|). 
Its block recursive version, identical to Algorithm ^ solves the problem with 
BSP cost W — 0{rA /p), H — 0{in? /p°^), S = 0(p“), for an arbitrary a, 1/2 < 
a < 2/3. 

Alternatively, the problem with nonnegative lengths can be solved by Dijk- 
stra’s algorithm (|S|, see also jS|). This greedy algorithm finds all shortest paths 
from a fixed source in order of increasing length. The sequential time complex- 
ity of Dijkstra’s algorithm is 0(n^). To compute the shortest paths between all 
pairs of nodes in parallel, one can apply Dijkstra’s algorithm independently to 
each node as a source (this approach is suggested e.g. in jlOITj l. The resulting 
algorithm has BSP cost W = 0{n^/p), H = 0{n^), S = 0(1). It thus has a 
higher communication cost, but a lower synchronisation cost, than the Floyd- 
Warshall algorithm. This tradeoff motivates us to look for an improved BSP 
algorithm, that would solve the all pairs shortest paths problem efficiently both 
in communication and synchronisation. 

In order to design such an algorithm, we use the principle of path doubling. 
No shortest path may contain more than n edges, therefore A" = A*. Ma- 
trix A" can be obtained by repeated squaring in logn matrix multiplications. 
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Therefore, the local computation cost of computing A" by repeated squaring is 
W = 0((n^ log n) I p) . A refined version of path doubling was proposed in ECHI- 
When run in parallel, this method allows one to compute the matrix = A* 
with local computation cost W = 0{rA /p). Compared to the Floyd-Warshall 
algorithm, the new method does not improve on the synchronisation cost by 
itself; however, an improvement can be achieved by combining the new method 
with Dijkstra’s algorithm. 

By a small perturbation of edge lengths, we can always make all edge and 
path lengths in the graph distinct. Therefore, from now on we assume that all 
shortest paths are unique. We use the term path size for the number of edges 
in a path. The main idea of the method is to perform path doubling, keeping 
track not only of path lengths, but also of path sizes. We assume that lengths 
and sizes are kept in a single data structure, called the path matrix. In a such a 
matrix X, each entry X[i,j] is either oo, or corresponds to a simple path from 
i to j. Addition and multiplication of path matrices are defined in the natural 
way. 

For an integer k, let X{k) denote the matrix of all paths in X of size exactly 
k. More precisely. 



X{k)[i,j] 



X[i,j] if path X[i,j] has size k 
oo otherwise 



Let X{ki, . . . ,ks) = X{ki)(B- ■ -©A(fcs) (remembering that © denotes numerical 
min). Note that for any path matrix X, we have 

A = A(0, 1, . . . , m) = A(0) © A(l) © • • • © A(m) 

where m is the maximum path size in X. 

For path matrices X, Y, we write A < T, if X[i,j] < Y[i,j] for all i,j 
(ignoring path sizes). We call an entry X[i,j] trivial, if X[i,j] = oo. We call A 
and Y disjoint, if either X[i,j], or is trivial for all i,j. 

Consider the nonnegative all-pairs shortest paths problem defined by path 
matrix A. This matrix contains all shortest paths of size 0 (the main diagonal) 
and of size 1 (the off-diagonal entries). For an integer k, matrix contains all 
shortest paths of size at most k (and maybe some other paths). Suppose that 
we have computed A^ for some k, 1 < k < n. Our next goal is to compute 
all shortest paths of size at most 3fc/2. Decompose the path matrix A^ into a 
disjoint semiring sum: 

= ^^=(0, 1, . . . , fc) = / © A'=(l) © • • • © A’^(k) 

Consider the upper half of this sum, which consists of matrices A^{k/2 + 
1), . . . , A^(A:). The total number of nontrivial entries in all these matrices is 
at most (since the matrices are disjoint), hence the average number of non- 
trivial entries per matrix is at most 2n^/k. For some I, k/2 < I < k, matrix 
A^{1) contains at most 2t\} jk nontrivial entries. The BSP cost of finding such 
an I is negligible. 
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Consider any shortest path of size in the range I + 1, . . . , 3fc/2. This path 
consists of an initial subpath of size I, and a final subpath of size at most k. 
Therefore, the semiring sum © A^{1) 0 contains all shortest paths of size 
at most 3fc/2: 

a'' © A'"{1) 0 a'' = (J © qa^< 



Since A^{1) has at most 2r? jk nontrivial entries, computation of A^{1) 0 A^ 
requires not more than 2rA jk semiring multiplications. 

For efficient parallel computation of the sparse-by-dense matrix product 
A^{1)qA'^, we need to partition the problem into p sparse-by-dense matrix mul- 
tiplication subproblems, where all the sparse arguments have an approximately 
equal number of nontrivial entries. This can be done by first partitioning the set 
of rows in A^{1) into p^^^ jk^^^ equal subsets, such that each subset contains at 
most f. 2 / 3 pi /2 nontrivial entries. This partitioning defines, up to a permutation of 
rows, a decomposition of the matrix into p^/^ jk^/^ equal horizontal strips. Each 



strip defines an xnxn sparse-by-dense matrix multiplication subproblem. 

Consider one of the above subproblems. Partition the set of columns in the 
strip intop^/^/fc^/^ equal subsets, such that each subset contains at most ^i/i"p 2/3 



nontrivial entries. This partitioning defines, up to a permutation of columns, a 



decomposition of the strip into equal square blocks. Each block defines an 



»l/3 



1 , 1/3 



X /I sparse-by-dense matrix multiplication subproblem. By partitioning the 

set of columns of the second argument of this subproblem into equal 

subsets, we obtain p^/^ • sparse-by-dense matrix multiplication subproblems 
of size X X 

The total number of resulting sparse-by-dense matrix multiplication subprob- 
lems is p. The sparse argument of each subproblem contains at most ^i/i"p 2/3 
nontrivial entries. The partitioning can be computed by a greedy algorithm, the 
BSP cost of which is negligible. The BSP cost of computing the matrix product 
© A^ is therefore W = 0(n^/(k ■ p)), H = 0(ri^ /{k}/^ ■ p^^^)), S = 0(1). 
The path doubling process is stopped after at most log 3 / 2 P rounds, when 
the matrix A^ (or some matrix < A^ , which is only better) has been computed. 
For some q, ^ < q < p, matrix AP{q) contains at most n^/p nontrivial entries. 



Therefore, this matrix can be broadcast to every processor with communication 
cost H = 0{v? jp). Each processor receives the matrix AP{q), picks n/p nodes, 
and computes all shortests paths originating in these nodes hy nip independent 
runs of Dijkstra’s algorithm. The result of this computation across all processors 
is the matrix closure A^{q)* . Matrix AP{q)* contains all shortest paths of sizes 
that are multiples of q (and maybe some other paths). 

Any shortest path in A* consists of an initial subpath of size that is a multiple 
of q, and a final subpath of size at most q < p- Therefore, all shortest paths for 
the original matrix A can be computed as the matrix product 



AP{q)* qAp = A 
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The cost of the resulting algorithm is VC = 0{n^/p), H = S = 

0(logp). We can further reduce the synchronisation cost by terminating the 
path doubling phase after fewer than loggy 2 P steps. For 1 < r < we can 
find a q such that the matrix A^(q) has at most n^/r nontrivial entries, therefore 
the communication cost of applying Dijkstra’s algorithm to find A^{q)* is H = 
0{r? jr). 

The resulting algorithm is as follows. 

Algorithm 2. All pairs shortest paths (nonnegative case). 

Parameters: integer n > p; integer r, 1 < r < 

Input: n X n matrix A over the (min, -V) semiring of nonnegative real numbers 
with oo. 

Output: n X n matrix closure A* . 

Description. The computation proceeds in three stages. 

First stage. Compute by at most log 3/2 r rounds of path doubling. 

Second stage. Select g, 0 < g < r, such that A^{q) contains at most n^/r nontriv- 
ial entries. Broadcast A'~{q) and compute the closure A'~{q)* by n independent 
runs of Dijkstra’s algorithm, n/p runs per processor. 

Third stage. Compute the product A^{q)* (■) = A* . 

Cost analysis. The local computation and communication costs of the first 
stage are dominated by the cost of its first round: W = 0{n^ /p) and H = 
0{n^ (p^^^). The synchronisation cost of the first stage is S' = O(logr). 

The cost of the second stage is VF = 0{n^/p), H = 0{r? jr'), S = 0(1). The 
cost of the third stage is VF = 0{n^/p), H = 0{n^ (p'^^^), S = 0(1). The local 
computation, communication and synchronisation costs of the whole algorithm 
are 



VF = 0(n^/p) H = 0{n^/r) S = 0(logr) ■ 

The two extremes of Algorithm El are the communication-efficient algorithm 
(r = p^/^), with 

VF = O(nVp) H = Oin^jp^!'^) S = 0(log p) 

and the multiple Dijkstra algorithm (r = 1), with 

W = 0(n^/p) H = O(n^) S = 0(1) 

The second stage of Algorithm El allows the following variation. Instead of 
using the matrix A^{q) with at most /r nontrivial entries, we can use the 
matrix A’’(r). In order to communicate this matrix efficiently, we represent it as 
a product 

A"(r) = A^q) 0 A"(r - q) 

For some q, 0 < q < r /2, the disjoint sum A^{q) © A’'(r — q) contains at most 
2r? jr nontrivial entries. Therefore, the second stage of the algorithm can be 
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replaced by broadcasting the matrices A'^{q) and A'~{r — q) (or, equivalently, their 
disjoint sum), recovering the product A'^(q) QA^{r — q) = A^(r), and computing 
the closure A^{r)*. A similar technique of broadcasting a path matrix can be 
used on every step of path doubling in the first stage of Algorithmic 

4.2 General Edge Lengths 

We now extend the algorithm to graphs where edge lengths may be negative. 
Formally, the problem consists in finding the closure A* of a matrix A over 
the (min,+) semiring of all real numbers with oo. The closure exists, if and 
only if the graph defined by the matrix does not contain a cycle of negative 
length. We cannot use our original method to solve this more general problem, 
because Dijkstra’s algorithm does not work on graphs with negative edge lengths. 
However, we can get around this difficulty by replacing Dijkstra’s algorithm with 
an extra stage of sequential path doubling. 

The extended algorithm has three stages. In the first stage, we compute the 
matrix A^ by 2 log 3^2 P steps of parallel path doubling. Let 

A^\{p)) = AP\p,2p,. . . ,p‘^) 



and 

A^\{p) - q) = AP\p - q,2p - q, . . . ,p“^ - q) 

We represent matrix A^^ {{p)) as a product 

AP\{p)) = AP\q)QAP\{p)-q) 

For some q, 0 < q < pj2, the disjoint sum A^^ {q) © A^^ {{p) — q) contains at 
most 2r? jp nontrivial entries. In the second stage, we collect matrices (g) 
and AP ((p) — g) in a single processor, and recover their product Ap {{p)). Since 
the matrix Ap ((p)) represents paths of p different sizes p,2p, . . . ,p^ , we can 
find a size I G {(p/2) • p, {p/2 + 1) ■ p,. . . ,p^}, such that the matrix AP^{{p)){l) 
contains at most 2r? jp nontrivial entries. Now the closure ^ ((p))* = ^ (p)* 
can be computed by sequential path doubling, with the first step computing the 
semiring sum 

AP\{p)) © AP\{p)){l) 0 AP\{p)) < A^P^/\{p)) 

The sequential cost of the closure computation is dominated by the cost of 
its first step, 0{rA /p). In the third stage, it remains to compute the product 
Ap\p)*qAp^ = A*. 

In contrast with the nonnegative case, early termination of the parallel path 
doubling phase would increase not only the communication cost, but also the 
local computation cost. Therefore, we do not consider this option. 

The resulting algorithm is as follows. 
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Algorithm 3. All pairs shortest paths (general case). 

Parameter: integer n> p. 

Input: n X n matrix A over the (min, -I-) semiring of real numbers with oo. 
Output: n X n matrix closure A* . 

Description. The computation proceeds in three stages. 

First stage. Compute Ap^ and A^^ {{p)) by at most 21og3/2P rounds of path 
doubling. 

2 2 

Second stage. Select q,0 < q < p/2, such that the disjoint sum A^ ((p) — 

q) contains at most jp nontrivial entries. Collect Ap (g) © Ap ((p) — g) in a 
single processor, and recover Ap ((p)) = Ap (g) 0 Ap ((p) — g). Compute the 
closure Ap ((p))* = Ap (p)* by sequential path doubling. 

Third stage. Compute the product Ap^ { p)* © Ap^ = A*. 

Cost analysis. The local computation and communication costs of the first 
stage are dominated by the cost of its first round: W = 0{n^ /p) and H = 
0(r? jp^!'^). The synchronisation cost of the first stage is S' = O(logp). 

The local computation cost of the second stage is dominated by the cost of 
its first round, equal to W — 0(nA /p). The communication and synchronisation 
costs of the second stage are H — 0{n'^(p), S = 0(1). 

The cost of the third stage is IT = 0{rA /p), H = 0{n'^ jp'^^^), S = 0(1). 
The local computation, communication and synchronisation costs of the 
whole algorithm are 

W = 0{n^/p) H=0{n^lp^l'^) S = 0(logp) ■ 

The described method is applicable not only to the (min,+) semiring (the 
standard shortest paths problem), but also to any semiring where addition 
is idempotent, e.g. the (V,A) semiring (the transitive closure problem), the 
(max, min) semiring (paths of maximum capacity), or the (max,-) semiring 
(paths of maximum reliability). Note that in the case of transitive closure com- 
putation by Algorithm El Boolean matrix multiplication cannot be used instead 
of general matrix multiplication, since the path doubling process involves the 
multiplication of path matrices, rather than ordinary Boolean matrices. It is 
not immediately clear if the BSP cost of general matrix multiplication can be 
reduced for path matrices with Boolean edge lengths. 

5 Conclusions 

We have presented a new BSP algorithm for the all-pairs shortest paths problem 
in a weighted directed dense graph. The algorithm adapts the method of selec- 
tive path doubling from \2t I hj to the BSP framework, and saves a substantial 
amount of synchronisation by combining selective path doubling with Dijkstra’s 
algorithm. In contrast with the general algebraic path algorithm, which per- 
forms 0(p^/^) to 0(p^/^) global synchronisation steps, our new algorithm only 
requires O(logp) synchronisation steps. The number of synchronisation steps can 
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Table 1. Summary of presented algorithms 



Problem 


W 


77 


S 


Matrix multiplication 


n^lp 




1 


Algebraic paths 


n^/p 






general, min 77 


- 


nVp2/3 


p2/3 


general, min S 


- 






All-pairs shortest paths 


n^/p 






general 


- 


nVp2/3 


logp 


nonnegative, min 77 


- 


nVp2/3 


logp 


nonnegative, min S 


- 




1 



be further reduced to 0(1), if the edge lengths are nonnegative. In this case, the 
algorithm exhibits a tradeoff between asymptotic costs of communication and 
synchronisation. 

It is not clear yet whether the presented algorithm is practical, because of 
the significant potential overhead of dealing with path matrices, instead of or- 
dinary numerical matrices (as e.g. in the Floyd- Warshall algorithm). However, 
our algorithm advances the theoretical understanding of BSP computation on 
dense graphs, and shows a possible source of faster parallel graph algorithms. 
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Abstract. We present a probabilistic algorithm that, given a connected 
graph G (represented by adjacency lists) of maximum degree d, with 
edge weights in the set {1, . . . ,rc}, and given a parameter 0 < e < 1/2, 
estimates in time 0{dwe~^ log the weight of the minimum spanning 
tree of G with a relative error of at most e. Note that the running time 
does not depend on the number of vertices in G. We also prove a nearly 
matching lower bound of fi{dwe~^) on the probe and time complexity 
of any approximation algorithm for MST weight. 

The essential component of our algorithm is a procedure for estimat- 
ing in time 0{de~^ log£~^) the number of connected components of an 
unweighted graph to within an additive error of en. The time bound 
is shown to be tight up to within the loge“^ factor. Our connected- 
components algorithm picks 0(l/e^) vertices in the graph and then grows 
“local spanning trees” whose sizes are specified by a stochastic process. 
Prom the local information collected in this way, the algorithm is able 
to infer, with high confidence, an estimate of the number of connected 
components. We then show how estimates on the number of components 
in various subgraphs of G can be used to estimate the weight of its MST. 



1 Introduction 

Traditionally, a linear time algorithm has been held as the gold standard of 
efficiency. In a wide variety of settings, however, large data sets have become 
increasingly common, and it is often desirable and sometimes necessary to find 
very fast algorithms which can assert nontrivial properties of the data in sublin- 
ear time. 

One direction of research that has been suggested is that of property test- 
ing j I bliSj . which relaxes the standard notion of a decision problem. Property 
testing algorithms distinguish between inputs that have a certain property and 
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DAAH04-96-1-0181. 
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those that are far (in terms of Hamming distance, or some other natural dis- 
tance) from having the property. Sublinear and even constant time algorithms 
have been designed for testing various algebraic and combinatorial properties 
(see HD for a survey). Property testing can be viewed as a natural type of ap- 
proximation problem and, in fact, many of the property testers have led to very 
fast, even constant time, approximation schemes for the associated problem (cf. 
pitil7ll| l. For example, one can approximate the value of a maximum cut in a 
dense graph in time 2^^^ with relative error at most £, by looking at 

only 0(e“^logl/e) locations in the adjacency matrix 0. Note that typically 
such schemes approximate the value of the optimal solution, here the size of a 
maxcut, without computing the structure that achieves it, i.e., the actual cut. 
Sometimes, however, a solution can also be constructed in linear or near-linear 
time. 



In this paper, we consider the problem of finding the weight of the minimum 
spanning tree (MST) of a graph. Finding the MST of a graph has a long and 
interesting history mm- Currently the best known deterministic algorithm 
of Chazelle 0 runs in 0{ma{m,n)) time, where n (resp. m) is the number of 
vertices (resp. edges) and a is inverse- Ackermann, and the randomized algorithm 
of Karger, Klein and Tarjan cn runs in linear expected time (see also |5ll8j for 
alternative models). 

In this paper, we show that there are conditions under which it is possible to 
approximate the weight of the MST of a connected graph in time sublinear in the 
number of edges. We give an algorithm which approximates the MST of a graph 
G to within a multiplicative factor of 1 -I- e and runs in time 0{dwe~^ log ^) for 
any G with max degree d and edge weights in the set {!,... ,m}. The relative 
error e (0 < e < 1/2) is specified as an input parameter. Note that if d and £ 
are constant and the ratios of the edge weights are bounded, then the algorithm 
runs in constant time. We also extend our algorithm to the case where G has 
nonintegral weights in the range [l,rc], achieving a comparable runtime with a 
somewhat worse dependence on e. 

Our algorithm considers several auxiliary graphs: If G is the weighted graph, 
let us denote by G^^'> the subgraph of G that contains only edges of weight at 
most i. We estimate the number of connected components in each To do 
so, we sample uniformly at random 0(l/e^) vertices in G^*\ and then estimate 
the size of the component that contains each sampled vertex by constructing 
“local trees” of some appropriate size defined by a random process. Based on 
information about these local trees, we can produce a good approximation for the 
weight of the MST of G. Our algorithm for estimating the number of connected 
components in a graph runs in time 0{de~'^loge~^) and produces an estimate 
that is within an additive error of en of the true count . The method is based on a 
similar principle as the property tester for graph connectivity given by Goldreich 
and Ron 0. 

We give a lower bound of G{dw/e^) on the time complexity of any algorithm 
which approximates the MST weight. In order to prove the lower bound, we give 
two distributions on weighted graphs, where the support set of one distribution 
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contains graphs with MST weight at least 1 + e times the MST weight of the 
graphs in the support of the other distribution. We show that any algorithm that 
reads o{dw/e^) weights from the input graph is unlikely to distinguish between 
graphs from the two distributions. We also prove a lower bound of 0{d/e^) 
on the running time of any approximation algorithm for counting connected 
components. 



2 Estimating the Number of Connected Components 

We begin with the problem of estimating the number of components in an ar- 
bitrary graph G. We present an algorithm which gives an additive estimate of 
the number of components in G to within en in 0{de~^ loge~^) time, for any 
0 < e < 1/2. We later show how to use the ideas from our algorithm to aid in 
estimating the weight of the MST of a graph. 

Let c be the number of connected components in G. Let Uu be the num- 
ber of vertices in u's component in G. Our algorithm is built around a simple 
observation: 

Fact 1 Given a graph with vertex set V , for every connected component I QV , 
^ = ^ J2uev = c- 

Our strategy is to estimate c by approximating each summand l/n^. Com- 
puting Uu directly can take linear time, so we construct an estimator of the 
quantity l/n„ that has the same expected value. We approximate the number 
of connected components via the algorithm given in Figure D The parameter W 
is a threshold value, which is set to 2/e for counting connected components and 
somewhat higher for its use in MST weight estimation. 



approx-number-connected-components(G, e, W) 

uniformly choose r = 0{l/e^) vertices Ui,...,Ur 
for each vertex m , 
set Pi = 0 

take the first step of a BFS from Ui 
(*) flip a coin 

if heads and number of vertices visited in BFS < W 
then resume BFS to double number of visited vertices 
if this allows BFS to complete 

then set /3i = visited in BFS 

else go to (*) 
output c = ^ XlLi A 



Fig. 1. Estimating the number of connected components 
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In the algorithm, doubling the number of vertices does not include duplicate 
visits to the same vertices; in other words, at each step the number of new 
vertices visited is supposed to match the number of vertices already visited. In 
our terminology, the first step of the BFS (shorthand for breadth first search) 
involves the visit of the single vertex Ui. We now bound the expectation and 
variance of the estimator f3i for a fixed i. If the BFS from Ui completes, the 
number of coin flips associated with it is [lognuj] and the number of distinct 
vertices visited is Let S denote the set of vertices in components of size 
< W. If Ui ^ S, then (3i = 0; otherwise, it is 2r^°s”“il/n„. with probability 
2-riog"ujl Q otherwise. Since (3i < 2, the variance of j3i is: 



var/3, < E/3^ < 2Eft 



2 

n 



E 



1 2c 
— < — 



nu 



n 



Then the variance of c is bounded by 



var c = var 




u? ^ 2nc 

^ - r ■ var fji < — . 



( 1 ) 



Since the number of components with vertices not in S is at most njW , we have 
that 

Tl .V V ^ f 

C- — <Ec= 7 — <c. 

\Al 71 - . 



u^S 



If we set W = 2/e, then 



and, by Chebyshev, 



c — ^<Ec<c 



(2) 



^ ^ ..1 /„ 1 var c 8c 

Prob c — E c > en/2 < 7 — — -tt < 

' ' ' (en/2)" 



(3) 



Choosing r = 0(I/e^) ensures that with constant probability arbitrarily close 
to 1, our estimate c of the number of connected components deviates from the 
actual value by at most en. 

The expected number of vertices visited in a given execution of the “for 
loop” is O(logVF), and each newly visited vertex incurs a cost of 0{d), so the 
algorithm runs in expected time 0{dt~^\ogW). For our setting of IF, this is 
0{de~^ log e~^). As stated, the algorithm’s running time is randomized. However, 
one can get a deterministic running time bound by stopping the algorithm after 
Cde^^loge”^ steps and outputting 0 if the algorithm has not yet terminated. 
This event occurs with probability at most 0(1/C'), which is a negligible addition 
to the error probability. Thus we have the following theorem: 



Theorem 2. Let c be the number of eomponents in a graph with n vertices. Then 
Algorithm approx-number-connected-components runs in time 0(de“^ log e“^) 
and with probability at least 3/4 outputs c such that |c — c| < en. 
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We can improve the running time to 0{{e + c/n)de~'^ log e“^), which is much 
better for small values of c. First, run the algorithm for r = 0(l/e). By Cheby- 
shev and (QE), 



Prob 



|c — Eel > 



Ec- 



< 



8nc 



< 



8n 



r{c + en/2Y r(c+en/2)’ 



which is arbitrarily small for re large enough. Next, we use this approximation 
c to “improve” the value of r. We set r = Aje + Aejis^n) for some large enough 
constant A and we run the algorithm again, with the effect of producing a second 
estimate c*. By 0131), 



Prob[ |c* 



Ec*| > en/2] < 



8c 16c 16 

< — , 

e^rn Aen + AE,c A 



and so with overwhelming probability, our second estimate c* of the number of 
connected components deviates from c by at most en. The running time of this 
new algorithm is 0((e + c/n)de“^ log e~^). 



3 Approximating the Weight of an MST 

In this section we present an algorithm for approximating the value of the MST 
in bounded weight graphs. We are given a connected graph G with maximum 
degree d and with each edge is assigned an integer weight between 1 and w. 
We assume that G is represented by adjacency lists or, for that matter, any 
representation that allows one to access all edges incident to a given vertex in 
0{d) time. We show how to approximate the weight of the minimum spanning 
tree of G with a relative error of at most e. 

In Section EH we give a new way to characterize the weight of the MST in 
terms of the number of connected components in subgraphs of G. In Section E2 
we give the main algorithm and its analysis. Finally, Section IQ addresses how 
to extend the algorithm to the case where G has nonintegral weights. 



3.1 MST Weight and Connected Components 

We reduce the computation of the MST weight to counting connected compo- 
nents in various subgraphs of G. To motivate the new characterization, consider 
the special case when G has only edges of weight 1 or 2 (i.e., w = 2). Let 
be the subgraph of G consisting precisely of the edges of weight 1, and let rii be 
its number of connected components. Then, any MST in G must contain exactly 
ni — 1 edges of weight 2, with all the others being of weight 1. Thus, the weight 
of the MST is exactly n — 2 + ni. We easily generalize this derivation to any w. 

For each 0 < £ < w, let G^^^ denote the subgraph of G consisting of all the 
edges of weight at most i. Define to be the number of connected components 
in G*-^^ (with defined to be n). By our assumption on the weights, = 1. 
Let M{G) be the weight of the minimum spanning tree of G. Using the above 
quantities, we give an alternate way of computing the value of M(G)\ 
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Claim 3. For integer w >2, 

W—1 

M (G) = n — w + ^ . 

i=l 

Proof: Let ai be the number of edges of weight i in an MST of G. (Note that 
is independent of which MST we choose 0.) Observe that for all 0 < £ < u> — 1, 
— 1, therefore 



W— 1 W 



W — 1 



w — 1 



M{G) = ^ iai = ^ ^ at = —w + ^ = n — w + ^ c 



(d 



i=l 



e=o i=i+i 



1=0 



i=l 



□ 



Thus, computing the number of connected components allows us to compute 
the weight of the MST of G. 

3.2 The Main Algorithm 

Our algorithm approximates the value of the MST by estimating each of the 
c^^^’s. The algorithm is given in Figure El 



approx-MST-weight(G, e) 

For i = 1, . . . , w — 1 

gb) = approx-number-connected-components(Gb\ e, 2ui/e) 
output V = n — w + 



Fig. 2. Approximating the weight of the MST 



Theorem 4. Let v be the weight of the MST ofG. Algorithm approx-mst-weight 
runs in time 0{dwe~^ log o,nd outputs a value v that, with probability at least 
3/4, differs from v by at most ev. 

Proof: Let c = Since we call approx-number-connected-components 

with parameter IT = 2w/e, (P 0 become 



cb) <Ecb)<cb) and 

2w 



varcb) < 



2ncb) 

r 



By summing over i, it follows that c — en/2 < Ec < c and varc < 2ncjr. 
Choosing re^ large enough, by Chebyshev we have 

18nc 



Prob[ |c — E c| > {n — w + c)e/3 ] < 



re^(n — in + c)^ ’ 
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which is arbitrarily small since we may assume that w/n is sufficiently small 
(else we might as well compute the MST explicitly, which can be done in 0{dn) 
time El) . It follows that, with high probability, the error on the estimate satisfies 



b — hi = |c — cl < 



en e{n — w + c) 



< ev. 



Since the expected running time of each call to approx-number-connected-compo- 
nents is 0{drlogw/e), the total running time is 0{dwe~^ log “). As before, the 
running time can be made deterministic by stopping execution of the algorithm 
after Cdwe~^ log ^ steps for some appropriately chosen constant C. □ 



3.3 Nonintegral Weights 

Suppose the weights of G are all in the range [l,u>], but are not necessarily 
integral. To extend the algorithm to this case, one can multiply all the weights 
by 1/e and round each weight to the nearest integer. Then one can run the above 
algorithm with error parameter e/2 and with a new range of weights [1, |"w/e]] to 
get a value v. Finally, output ev. The relative error introduced by the rounding 
is at most e/2 per edge in the MST, and hence e/2 for the whole MST, which 
gives a total relative error of at most e. The runtime of the above algorithm is 
O(dwe-Mogf). 

4 Lower Bounds 

We prove that our algorithms for estimating the MST weight and counting con- 
nected components are essentially optimal. 

Theorem 5. Any probabilistic algorithm for approximating, with relative er- 
ror e, the MST weight of a eonnected graph with max degree d and weights in 
{1, . . . ,u>} requires fi{dwe~'^) edge weight lookups on average. It is assumed that 
w > 1 and C\Jwjn < e < 1/2, for some large enough constant C . 

We can obviously assume that ic > 1, otherwise the MST weight is always 
n — 1 and no work is required. The lower bound on e is nonrestrictive since 
we can always compute the MST exactly in 0{dn) time, which is 0{dwe~'^) for 
£ = 0{^/wJn). 

Theorem 6. Given a graph with n vertices, any probabilistic algorithm for ap- 
proximating the number of eonnected eomponents with an additive error of en 
requires G(de~^) edge lookups on average. It is assumed that G/y/n < e < 1/2, 
for some large enough eonstant G . 

Again, note that the lower bound on e is nonrestrictive since we can always 
solve the problem exactly in 0{dn) time. 
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Both proofs revolve around the difficulty of distinguishing between two 
nearby distributions. For any 0 < g < 1/2 and s = 0,1, let denote the 
distribution induced by setting a 0/1 random variable to 1 with probability 
Qs = g(l + (— l)'*e). We define a distribution V on n-bit strings as follows: (1) 
pick s = 1 with probability 1/2 (and 0 else); (2) then draw a random string 
from 2?® (by choosing each bt from 2?® independently). Consider a probabilistic 
algorithm that, given access to such a random bit string, outputs an estimate 
on the value of s. How well can it do? 

Lemma 7. Any probabilistic algorithm that can guess the value of s with a 
probability of error below 1/4 requires jq) bit lookups on average. 

Proof: By Yao’s minimax principle, we may assume that the algorithm is de- 
terministic and that the input is distributed according to T>. It is intuitively 
obvious that any algorithm might as well scan 6162 until it decides it has 
seen enough to produce an estimate of s. In other words, there is no need to 
be adaptive in the choice of bit indices to probe (but the running time itself 
can be adaptive). To see why is easy. An algorithm can be modeled as a binary 
tree with a bit index at each node and a 0/1 label at each edge. An adaptive 
algorithm may have an arbitrary set of bit indices at the nodes, although we can 
assume that the same index does not appear twice along any path. Each leaf is 
naturally associated with a probability, which is that of a random input from 
T> following the path to that leaf. The performance of the algorithm is entirely 
determined by these probabilities and the corresponding estimates of s. Because 
of the independence of the random biS, we can relabel the tree so that each path 
is a prefix of the same sequence of bit probes &162 • • •. This oblivious algorithm 
has the same performance as the adaptive one. 

We can go one step further and assume that the running time is the same for 
all inputs. Let t* be the expected number of probes, and let 0 < a < 1 be a small 
constant. With probability at most a, a random input takes time > t = t*/a. 
Suppose that the prefix of bits examined by the algorithm is 61 • • • 6„. If u < t, 
simply go on probing bu+i ■ ■ - bt without changing the outcome. If u > t, then 
stop at bt and output s = 1. Thus, by adding a to the probability of error, we 
can assume that the algorithm consists of looking up b\ - ■ - bt regardless of the 
input string. 

Let p s{bi ■ ■ ■ bt) he the probability that a random t-bit string chosen from 2?® 
is equal to b\ - ■ - bt- The probability of error satisfies 

Terr > ^ ^ min (&i • • • ) . 

A s 

bi-.-bt 

Obviously, Ps{bi ■ • - bt) depends only on the number of ones in the string, so if 
Ps{k) denotes the probability that b\ + ■ ■ ■ + bt = k, then 

Perr > X ^ min Ps{k). 

A s 



( 4 ) 
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By the normal approximation of the binomial distribution, 



Ps{k) 




(k-tgsr 
g 2tqs(l-gs) . 



as t — >■ oo. This shows that Ps{k) = over an interval Ig of length 

centered at tqg. If qte^ is smaller than a suitable constant 70, then 
\tqo — tqi\ is small enough that /q H /i is itself an interval of length il{y/qt)-, 
therefore Perr = 1^(1) • This shows that if the algorithm runs in expected time 
7o£“^/g', for some constant 70 > 0 small enough, then it will fail with probability 
at least some absolute constant. By setting a small enough, we can make that 
constant larger than 2a. This means that, prior to uniformizing the running 
time, the algorithm must still fail with probability a. 

Note that by choosing 70 small enough, we can always assume that a > 1/4. 
Indeed, suppose by contradiction that even for an extremely small 71, there is 
an algorithm that runs in time at most 7i£~^/q and fails with probability <1/4. 
Then run the algorithm many times and take a majority vote. In this way we 
can bring the failure probability below a for a suitable 71 = 71 (a, 70) < 7o> and 
therefore reach a contradiction. This means that an expected time lower than 
e“^/g by a large enough constant factor causes a probability of error at least 
1/4. □ 



Proof (Theorem 1^: Consider the graph G consisting of a simple cycle of n 
vertices v\, . . . ,Vn- Pick s S {0, 1} at random and take a random n-bit string 
bi'--bn with bits drawn independently from ^6xt, remove from G any 

edge (rii, Vi+i mod n) if bi = 0. Because e > Gjy/n, the standard deviation of the 
number of components, which is 0{^/n), is sufficiently smaller than en so that 
with overwhelming probability any two graphs derived from differ 

by more than en/2 in their numbers of connected components. That means that 
any probabilistic algorithm that estimates the number of connected components 
with an additive error of en/2 can be used to identify the correct s. By Lemma|3 
this requires C(e“^) edge probes into G on average. Replacing e by 2e proves 
Theorem 0for graphs of degree d = 2. For arbitrary d, we may simply add d — 2 
loops to each vertex. Each linked list thus consists of two “cycle” pointers and 
d—2 “loop” ones. If we place the cycle pointers at random among the loop ones, 
then it takes f2{d) probes on average to hit a cycle pointer. If we single out the 
probes involving cycle pointers, it is not hard to argue that the probes involving 
cycle pointers are, alone, sufficient to solve the connected components problem 
on the graph deprived of its loops: one expects at most 0{T/d) such probes and 
therefore T = f2{ds~^). □ 



Proof (Theorem El : Again we begin with the case d = 2. The input graph G is a 
simple path of n vertices. Pick s G {0, 1} at random and take a random (n — 1)- 
bit string b\ ■ ■ -bn-i with bits drawn independently from 2?®, where q = 1/w. 
Assign weight w (resp. 1) to the Ath edge along the path if 6^ = 1 (resp. 0). The 
MST of G has weight n — 1 + (u> — 1) ^ and so its expectation is 0{n). Also, 
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note that the difference A in expectations between drawing from or 2?^ is 
0{en). 

Because e > C\fwjn, the standard deviation of the MST weight, which is 
&{y/nw), is sufficiently smaller than A that with overwhelming probability any 
two graphs derived from 2?^ and differ by more than A/2 in MST weight. 
Therefore, any probabilistic algorithm that estimates the weight with a relative 
error of e/D, for some large enough constant D, can be used to identify the 
correct s. By Lemma|3 this means that f2{we~'^) probes into G are required on 
average. 

For (2 > 2, simply join each vertex in the cycle to d — 2 others (say, at 
distance > 2 to avoid introducing multiple edges) and, as usual, randomize the 
ordering in each linked list. Assign weight m + 1 to the new edges. (Allowing the 
maximum weight to be w + 1 instead of w has no influence on the lower bound 
we are aiming for.) Clearly none of the new edges are used in the MST, so the 
problem is the same as before, except that we now have to find our way amidst 
d — 2 spurious edges, which takes the complexity to D{dwe~^). □ 



5 Open Questions 

It is natural to ask what can be done if the max degree restriction is lifted. 
We have made some progress on the case of graphs of bounded mean degree. 
Our algorithm for the case of nonintegral weights requires extra time. Is this 
necessary? Can the ideas in this paper be extended to finding maximum weighted 
independent sets in general matroids? There are now a small number of examples 
of approximation problems that can be solved in sublinear time; what other 
problems lend themselves to sublinear approximation schemes? More generally, 
it would be interesting to gain a more global understanding of what can and 
cannot be approximated in sublinear time. 
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Abstract. The general asymmetric (and metric) TSP is known to be ap- 
proximable only to within an 0(log n) factor, and is also known to be ap- 
proximable within a constant factor as soon as the metric is bounded. In 
this paper we study the asymmetric and symmetric TSP problems with 
bounded metrics and prove approximation lower bounds of 101/100 and 
203/202, respectively, for these problems. We prove also approximation 
lower bounds of 321/320 and 743/742 for the asymmetric and symmetric 
TSP with distances one and two. 



1 Introduction 

A common special case of the Traveling Salesman Problem (TSP) is the metric 
TSP, where the distances between the cities satisfy the triangle inequality. The 
decision version of this special case was shown to be NP-complete by Karp |S|, 
which means that we have little hope of computing exact solutions in polynomial 
time. Christofides [S| has constructed an elegant algorithm approximating the 
metric TSP within 3/2, i.e., an algorithm that always produces a tour whose 
weight is at most a factor 3/2 from the weight of the optimal tour. For the 
case when the distance function may be asymmetric, the best known algorithm 
approximates the solution within O(logn), where n is the number of cities j2|. 
As for lower bounds, Papadimitriou and Yannakakis have shown that there 
exists some constant, see also such that it is NP-hard to approximate the 
TSP where the distances are constrained to be either one or two — note that 
such a distance function always satisfies the triangle inequality — within that 
constant. This lower bound was improved by Engebretsen to 2805/2804 — e 
for the asymmetric and 5381/5380 — e for the symmetric, respectively, TSP with 
distances one and two. Bockenhauer et. al m considered the symmetric TSP 
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with distances one, two and three, and were able to prove a lower bound of 
3813/3812 — e. (For a discussion of bounded metric TSP, see also Trevisan m-) 
It appears that the metric TSP lacks the good definability properties which 
were needed (so far) for proving strong nonapproximability results. Therefore, 
any new insights into explicit lower bounds here seem to be of a considerable 
interest. 

Papadimitriou and Vempala HOI recently announced lower bounds of 42/41 — 
e and 129/128 — e, respectively, for the asymmetric and symmetric versions, re- 
spectively, of the TSP with graph metric, but left the question of the approx- 
imability for the case with bounded metric open. However, their proof contained 
an error influencing the explicit constants. The corrected proof and the new 
constants of 98/97 — e and 234/233 — e are computed in Apart from being 
an interesting question on its own, it is conceivable that the special cases with 
bounded metric are easier to approximate than the cases when the distance be- 
tween two points can grow with the number of cities in the instance. Indeed, the 
asymmetric TSP with distances bounded by B can be approximated within B 
by just picking any tour as the solution and the asymmetric TSP with distances 
one and two can be approximated within 17/12 jl4j . The symmetric version of 
the latter problem can be approximated within 7/6 jl 2j . 

In this paper, we consider the case when the metric contains only integer 
distances between one and eight and prove a lower bound of 101/100 — e for the 
asymmetric case and 203/202 — e for the symmetric case. This is an improve- 
ment of an order of magnitude compared to the previous best known bounds of 
2805/2804 — e and 3813/3812 — e for this case, respectively |3tll6j . We also prove 
that it is NP-hard to approximate the asymmetric TSP with distances one and 
two within 321/320 — e, for any constant e > 0. For the symmetric version of 
the latter problem we show a lower bound of 743/742 — e. The previously best 
known bounds for this case are 2805/2804 — e and 5381/5380 — e, respectively |5j. 
Our proofs depend on explicit reductions from certain bounded dependency in- 
stances of linear equations satisfiability. The main idea is to construct certain 
uniform circles of equation gadgets and, in the second part, certain combined 
hybrid circle constructions. The reductions for the symmetric case are omitted 
from this extended abstract; they will appear in the full version of the paper. 

Definition 1. The Asymmetric Traveling Salesman Problem (ATSP) is the fol- 
lowing minimization problem: Given a collection of cities and a matrix whose 
entries are interpreted as the distance from a city to another, find the shortest 
tour starting and ending in the same city and visiting every city exactly once. 

Definition 2. (l,i?)-ATSP is the special case of ATSP where the entries in the 
distance matrix obey the triangle inequality and the off-diagonal entries in the 
distance matrix are integers between 1 and B. 

2 The Hardness of (1,H)-ATSP 

We reduce, similarly to Papadimitriou and Vempala nmm, from Hastad’s lower 
bound for E3-Lin mod 2 j^. Our construction consists of a circle of equation 
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gadgets testing odd parity. This is no restriction since we can easily transform a 
test for even parity into a test for odd parity by flipping a literal. Three of the 
edges in the equation gadget correspond to the variables involved in the parity 
check. These edges are in fact gadgets, so called edge gadgets, themselves. Edge 
gadgets from different equation gadgets are connected to ensure consistency 
among the edges representing a literal. This requires the number of negative 
occurrences of a variable to be equal to the number of positive occurrences. This 
is no restriction since we can duplicate every equation a constant number of 
times and flip literals to reach this property. 

Definition 3 . E3-Lin mod 2 is the following maximization problem: Given an 
instance of n variables and m equations over Z2 with exactly three unknowns in 
each equation, find an assignment to the variables that satisfies as many equa- 
tions as possible. 

Theorem 1 ([Sj). There exists instances of E3-Lin mod 2 with 2m equations 
such that, for any constant e > Q, it is NP-hard to decide if at most em or 
at least (1 — e)m equations are left unsatisfied by the optimal assignment. Each 
variable in the instance occurs a constant number of times. 

We describe our instance of (1,B)-ATSP by constructing a weighted directed 
graph G and then let the (l,i?)-ATSP instance have the nodes of G as cities. 
For two nodes u and v in G, let ^{u, v) be the length of the shortest path from u 
to v in G. The distance between two cities u and v is the (l,i?)-ATSP instance 
is then defined to be u\m.{B,l{u,v)}. 

2.1 The Gadgets 

The gadgets are parametrized by the parameters a, b and d; they will be specified 
later. Our construction follows Papadimitriou and Vempala iEnn!, but we use 
a slightly different accounting method in our proofs. 

The equation gadget for equations of the form x-\-y-\-z = 0 is shown in Fig.Q 
The key property of this gadget is that there is a Hamiltonian path through the 
gadget only if zero or two of the ticked edges are traversed. To form the circle 
of equation gadgets, vertex A in one gadget coincides with vertex B in another 
gadget. 

The edge gadget is shown in Fig. |3 Each of the bridges is shared between 
two different edge gadgets, one corresponding to a positive occurrence of the 
literal and one corresponding to a negative occurrence. The precise coupling is 
provided by a perfect matching in a d-regular bipartite multigraph {Vi yjV 2 ,E) 
on 2k vertices with the following property: For any partition of Vi into subsets 
Si, U\ and T\ and any partition of V2 into subsets S2, U2 and T2 such that there 
are no edges from Ui to U 2 , the total number of edges from vertices in Ti to 
vertices in T 2 is greater than 

min{fc, \Ui\ + \T2\ + | 5 i| + |^2|, IC/2I + |Ti| + |^i| + |^2|} |^i| + |^2| 



a-\- b 



2 
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Fig. 1. The gadget for equations of the form x + y + z = 0. There is a Hamiltonian 
path from A to B only if zero or two of the ticked edges, which are actually gadgets 
themselves (Fig. 13, are traversed. The non-ticked edges have weight 1. 



The purpose of this construction is to ensure that it is always optimal for the 
tour to traverse the graph in such a way that all variables are given consistent 
values. The edge gadget gives an assignment to an occurrence of a variable by 
the way it is traversed. 

Definition 4. We call an edge gadget where all bridges are traversed from left 
to right in Fig. |3 traversed and an edge gadget where all bridges are traversed 
from right to left untraversed. All other edge gadgets are called semitraversed. 







Fig. 2. The edge gadget consists of d bridges. Each of the bridges are shared between 
two different edge gadgets and consist of two undirected edges of weight a/2. The 
leftmost directed edge above has weight 1, the directed edges leaving a bridge have 
weight b. 




Fig. 3. A traversed edge gadget represents the value 1. 
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Fig. 4. An untraversed edge gadget represents the value 0. 



2.2 Proof of Correctness 

If we assume that the tour behaves nicely, i.e., that the edge gadgets are ei- 
ther traversed or untraversed, it is straightforward to establish a correspondence 
between the length of the tour and the number of unsatisfied equations. 

Lemma 1. The only way to traverse the equation gadget in Fig. ^with a tour of 
length 4 — if the edge gadgets count as length one for the moment — is to traverse 
an even number of edge gadgets. All other locally optimal traversals have length 5. 

Proof. It is easy to see that any tour traversing two ticked edges and leaving 
the third one untraversed has length 4. Any tour traversing one ticked edge and 
leaving the other two ticked edges untraversed has length at least 5. Strictly 
speaking, it is impossible to have three traversals since this does not result in a 
tour. However, we can regard the case when the tour leaves the edge gadget by 
jumping directly to the exit node of the equation gadget as a tour with three 
traversals; such a tour gives a cost of 5. 



Lemma 2. In addition to the length 1 attributed to the edge gadget above, the 
length of a tour traversing an edge gadget in the intended way is d{a + b). 

Proof. Each bridge has length a, and every bridge must have one of the outgoing 
edge traversed. Thus, the total cost is d{a + b). 



Lemma 3. Suppose that there are 2m equations in the E3-Lin mod 2 instance. 
If the tour is shaped in the intended way, i.e., every edge gadget is either traversed 
or untraversed, the length of the tour is Smd(a -I- 6) -I- 4m -|- u, where u is the 
number of unsatisfied equations resulting from the assignment represented by the 
tour. 

Proof. The length of the tour on an edge gadgets is d{a+b). There are three edge 
gadgets corresponding to every equation and every bridge in the edge gadget is 
shared between two equation gadgets. Thus, the length of the tour on the edge 
gadgets is 2m • 3d{a + b)f2 = 3md{a + b) The length of the tour on an equation 
gadget is 4 if the equation is satisfied and 5 otherwise. Thus, the total length is 
3md{a -I- 6) -I- 4m -I- u. 
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The main challenge now is to prove that the above correspondence between the 
length of the optimum tour and the number of unsatisfied equation holds also 
when we drop the assumption that the tour is shaped in the intended way. 

To count the excessive cost due to traversed non-edges of the graph defining 
our (l,i?)-ATSP instance, we note that every traversed non-edge of weight re > 1 
corresponds to a path of length min{w, 5} on edges in the graph defining the 
instance. We thus reroute every such tour its corresponding path if w < B] ii 
w > B we make the tour follow the first B/2 and last B/2 edges of the tour and 
then pretend that the tour does a jump of zero cost between these two vertices. 
This produces something which is not a tour — we call it a pseudo-tour — since 
some edges are traversed more than once and some vertices are connected to 
more than two traversed edges. From now on, most of the reasoning concerns 
this pseudo-tour. Our proof uses the following technical lemma: 

Lemma 4 (P,HJ). For k sufficiently large, almost every ^-regular bipartite 
multigraph (Vi U V 2 ,E) on 2k vertices has the following property: For any par- 
tition ofVi into subsets Si, Ui and Ti and any partition 0 /V 2 into subsets S 2 , 
U 2 and T 2 such that there are no edges from Ui to U 2 , the total number of edges 
from vertices in T\ to vertices in T 2 is greater than 

min{fc, \Ui\ + \T 2 \ + |5i| + 1^21, IC/ 2 I + |Ti| + |^i| + |^ 2 |} |^i| + 

8 2 

Given the above lemma, the following sequence of lemmas give a lower bound on 
the extra cost, not counting the “normal” cost of d{a -\- b) per edge gadget and 
4 per equation gadget, that results from a non-standard behavior of the tour. 

We have already seen that an unsatisfied equation adds an extra cost of 1. 
Edge gadgets that are either traversed or untraversed do not add any extra 
cost, except for the case when two traversed equation gadgets share a bridge; 
this results in a bridge being traversed in both directions by the pseudo-tour. A 
pseudo-tour resulting from a proper TSP tour can never result in two untraversed 
edge gadgets sharing a bridge; this would imply a cycle of length 2a in the original 
TSP tour. 

Lemma 5. Two traversed edge gadgets that share a bridge give an extra cost 
of a-\-b to the length of the tour. 

Proof. If two traversed edge gadgets are connected, there must be a bridge that 
is traversed in both directions. Such a bridge gives an extra cost of a -I- 6. 

Lemma 6. Suppose that B > 2max{a, 6}. Then every semitraversed edge gad- 
get adds an extra cost of at least min{a, b} to the length of the tour. 

Proof (sketch). We call a bridge balanced with respect to a pseudo-tour if there 
is at least one edge of the pseudo-tour adjacent to each endpoint of the bridge. 
Note that an unbalanced bridge always gives an extra cost of a, since the bridge 
must be traversed in both directions by the pseudo-tour. Thus, we always obtain 
an extra cost of two if any of the bridges are unbalanced. 
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Now assume that all bridges are balanced. Since the edge gadget is semitra- 
versed, all bridges cannot be traversed in the same direction. Thus, there are two 
adjacent bridges that are traversed in different directions. This gives an extra 
cost of b. 

Lemma 7. For a = b = d = A, there exists a coupling of the equation gad- 
gets with the property that it can never he advantageous to have inconsistently 
traversed equation gadgets. 

Proof. Repeat the following argument for every variable x: 

Let k be the number of occurrences of x (and also the number of occurrences 
of x). Pick a bipartite graph on 2k vertices having the properties stated in 
Lemma 0 We know by Lemma EJ that such a graph exists — since the graph has 
constant size, we can try all possible graphs in constant time. 

Put occurrences of x at one side and occurrences of x on the other side of 
the bipartite graph. Each vertex in the graph can be labeled as T, U or S, 
depending on whether it is traversed, untraversed or semitraversed. Let Ti be 
the set of traversed positive occurrences and T 2 be the set of traversed negative 
occurrences. Define Ui, C/2, Si, and S 2 similarly. We can assume that |C/i| + |T2| < 

I C/2 1 + |Tj| — otherwise we just change the indexing convention. 

We now consider a modified tour where the positive occurrences are traversed 
and the negative occurrences are untraversed. This decreases the cost of tour by 
at least 4(|S'i| + |5'2|) + 8|(Ti,T2)|, where |(Ti,T2)| denotes the number of edges 
between Ti and T 2 , and increases it by minjfc, |S'i| + |5'2| + |C/i| + II2I}. But the 
bipartite graph has the property that 

8 |(Ti,T 2)| > min{k, \Ui\ + IT2I + \Si\ + |52|} - 4(|5i| + \S 2 \) 

which implies that the cost of tour decreases by this transformation. Thus, we 
can assume that x is given a consistent assignment by the tour. 

Theorem 2. For any constant e > 0, it is NP-hard to approximate (1,8)-ATSP 
within 101/100 — e. 

Proof. Given an instance of E3-Lin mod 2 with 2m equations where every vari- 
able occurs a constant number of times, we construct the corresponding instance 
of (1,8)-ATSP with a = b = d = 4. This can be done in polynomial time. By 
the above lemma, we can assume that all edge gadgets are traversed consistently 
in this instance. The assignment obtained from this traversal satisfies 2m — u 
equations if the length of the tour is 3md{a -I- 6) -I- 4m u. If we could decide 
if the length of the optimum tour is at most (3d(a -I- 6) -I- 4 -|- ei)m or at least 
{3d{a -I- 6) -I- 5 — 62 ) 171 , we could decide if at most eim or at least (1 — e2)m of the 
equations are left unsatisfied by the corresponding assignment. But to decide 
this is NP-hard by Theorem ^ 

In the full version of this paper, we also prove the following theorem: 

Theorem 3. For any constant e > 0, it is NP-hard to approximate (1,8)-TSP 
within 203/202 — e. 
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3 The Hardness of (1,2)-ATSP 

We apply the construction used by Berman and Karpinski |2| to prove stronger 
hardness results for instances of several combinatorial optimization problems 
where the number of occurrences of every variable is bounded by some constant. 
In particular, |2| devises a reduction from systems of linear equations mod two 
with exactly three unknowns in each equation to a problem called Hybrid with 
the two following properties: Each equation contains either two or three literals 
and each literal occurs exactly three times. 

Definition 5. Hybrid is the following maximization problem: Given a system 
of linear equations mod 2 containing n variables, m 2 equations with exactly two 
unknowns, and m3 equations exactly with three unknowns, find an assignment 
to the variables that satisfies as many equations as possible. 

Theorem 4 ([2]). There exists instances of Hybrid with 42^ variables, 
60zz equations with two variables, and 2v equations with three variables such 
that: 

1. Each variable occurs exactly three times. 

2. For any constant e > 0, it is NP-hard to decide if at most ev or at least 

{1 — e)v equations are left unsatisfied. 

Since we adopt the construction of Berman and Karpinski |2j, we can partly rely 
on their main technical lemmas, which simplifies our proof of correctness. 

On a high level, the (1,2)-ATSP instance in our reduction consists of a circle 
formed by equation gadgets representing equations of the form x + y + z = 0 
and X + y = 1. These gadgets are coupled in a way ensuring that the three 
occurrences of a variable are given consistent values. In fact, the instances of 
Hybrid produced by the Berman-Karpinski construction have a very special 
structure. Every variable occurs in at least two equations with two unknowns, 
and those equations are all equivalences, i.e., equations of the form x + y = 0. 
Since our gadget for equations with two unknowns tests odd parity, we have 
to rewrite those equations as x + y = 1 instead. Similarly, the equations of 
the form x + y + z = 1 must be rewritten with one variable negated since our 
gadgets for equations with three unknowns only test even parity. Turning to 
the coupling needed to ensure consistency, we have three occurrences of every 
variable. Since we do not have any gadgets testing odd parity for three variables 
or even parity for two variables, we may have to negate some of the occurrences. 
We now argue that there are either one or two negated occurrences of every 
variable. The Hybrid instance produced by the Berman-Karpinski construction 
can be viewed as a collection of wheels where the nodes correspond to variables 
and edges to equations. The edges within a wheel all represent equations with 
two unknowns, while the equations with three unknowns are represented by 
hyperedges connecting three different wheels. The equations corresponding to 
the edges forming the perimeter of the wheel can be written as a;i -I- 2:2 = 1, 
X 2 + x^ = 1, . . . , Xk-i -I- Xfc = 1, and Xk + xi = 1, which implies that there is at 
least one negated and at least one unnegated occurrence of each variable. 
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Corollary 1. There exists instances of Hybrid with42v variables, 6 O 1 ' equations 
of the form x + y = 1 mod 2, and 2z/ equations of the form x + y + z = 0 mod 2 
or x + y + z = 0 mod 2 such that: 

1. Each variable occurs exactly three times. 

2. There is at least one positive and at least one negative occurrence of each 
variable. 

3. For any constant e > 0, it is NP-hard to decide if at most ev or at least 
(1 — e)v equations are left unsatisfied. 

To prove our hardness result for (1,2)-ATSP, we reduce instances of Hybrid of the 
form described in Corollary Q]to instances of (1,2)-ATSP and prove that, given a 
tour in the (1,2)-ATSP instance, it is possible to construct an assignment to the 
variables in the original Hybrid instance with the property that the number of 
unsatisfied equations in the Hybrid instance is related to the length of the tour 
in the (1,2)-ATSP instance. 

To describe a (1,2)-TSP instance, it is enough to specify the edges of weight 
one. We do this by constructing a graph G and then let the (1,2)-TSP instance 
have the nodes of G as cities. The distance between two cities u and v is defined 
to be one if (w, v) is an edge in G and two otherwise. To compute the weight of 
a tour, it is enough to study the parts of the tour traversing edges of G. In the 
asymmetric case G is a directed graph. 

Definition 6. We call a node where the tour leaves or enters G an endpoint. A 
node with the property that the tour both enters and leaves G in that particular 
node is called a double endpoint and counts as two endpoints. 

If c is the number of cities and 2e is the total number of endpoints, the weight 
of the tour is c + e since every edge of weight two corresponds to two endpoints. 

3.1 The Gadgets 

The equation gadget for equations of the form cc + ?/ + z = 0is shown in Fig. ra- 
the same gadget as in the {l,B) case. However, the ticked edges now represent 
a different structure. 

The equation gadget for equations of the form a; + j/ = 1 is shown in Fig. El 
The key property of this gadget is that there is a Hamiltonian path through the 
gadget only if one of the ticked edges is traversed. 




Fig. 5. The gadget for equations of the form x + y = 1. There is a Hamiltonian path 
from A to B only if one of the ticked edges is traversed. 



The ticked edges in the equation gadgets are syntactic sugar for a construc- 
tion ensuring consistency among the three occurrences of each variable. As we 
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noted above, either one or two of the occurrences of a variable are negated. The 
construction in Fig. El ensures that the occurrences are given consistent values, 
i.e., that either cc = 0 and a; = 1, or a: = 1 and ir = 0. If there is one negated 
occurrence of a variable, the upper part of the gadget connects with that occur- 
rence and the lower part connects with the two unnegated occurrences. If there 
are two negated occurrences, the situation is reversed. 




Fig. 6. The gadget ensuring consistency for a variable. If there are two positive occur- 
rences of the variable, the ticked edges corresponding to those occurrences are repre- 
sented by the parts enclosed in the dotted curves and the ticked edge corresponding 
to the negative occurrence is represented by the part enclosed in the dashed curve. If 
there are two negative occurrences, the roles are reversed. 



3.2 Proof of Correctness 

We want to prove that every unsatisfied equation has an extra cost of one as- 
sociated with it. At first, it would seem that this is very easy — the gadget in 
Fig. Eis traversed by a path of length four if the equation is satisfied and a path 
of length at least five otherwise; the gadget in Fig. 0 is traversed by a path of 
length one if the equation is satisfied and a path of length at least two otherwise; 
and the gadget in Fig. Elensures consistency and is traversed by a tour of length 
six, not counting the edges that were accounted for above. Unfortunately, things 
are more complicated than this. Due to the consistency gadgets, the tour can 
leave a ticked edge when it is half-way through it, which forces us to be more 
careful in our analysis. 

We count the number of endpoints that occur within the gadgets; each end- 
point gives an extra cost of one half. We say that an occurrence of a literal is 
traversed if both of its connected edges are traversed, untraversed if none of its 
connecting edges are traversed, and semitraversed otherwise. To construct an 
assignment to the literals, we use the convention that a literal is true if it is 
either traversed or semitraversed. We need to show that there are two endpoints 
in gadgets that are traversed in such a way that the corresponding assignment 
to the literals makes the equation unsatisfied. The following lemmas are easy, 
but tedious, to verify by case analysis; we omit the proofs form this extended 
abstract: 

Lemma 8. It is loeally optimal to traverse both bridges, i.e., both pairs of undi- 
rected edges, in the consistency gadget. 
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Lemma 9. Every semitraversed occurrence introduces at least one endpoint. 



Lemma 10. It is always possible to change a semitraversed occurrence into a 
traversed one without introducing any endpoints in the consistency gadget. 



Lemma 11. A “satisfying traversal” of the gadget in Fig. 0 has length 1, all 
other locally optimal traversals have length at least 2, i.e., contain at least two 
endpoints within the gadget. 



Lemma 12. A “satisfying traversal” of the gadget in Fig. Q has length all 
other locally optimal traversals have length at least 5, i.e., contain at least two 
endpoints within the gadget. 

We also need to prove that the gadget we use for consistency actually implements 
consistency. 

Lemma 13. The gadget in Fig. 0 ensures consistency and is traversed by a tour 
of length 6, not counting the edges or endpoints that were accounted for in the 
above lemmas. 

By combining the above lemmas, we have shown the following connection be- 
tween the length of an optimum tour and the number of unsatisfied equations 
in the corresponding instance of Hybrid. 

Theorem 5. Suppose that we are given an instance of Hybrid with n variables, 
7712 equations of the form x + y = 1 mod 2, and m3 equations of the form x + 
y + z = Q mod 2 or x + y + z = Q mod 2 such that: 

1. Each variable occurs exactly three times. 

2. There is at least one positive and at least one negative occurrence of each 
variable. 

Then we can construct an instance of (1,2)-ATSP with the property that a tour 
of length 6n + m 2 + Ims -|- u corresponds to an assignment satisfying all but u 
of the equations in the Hybrid instance. 



Corollary 2. For any constant e > 0, it is NP-hard to approximate ( 1,2)-ATSP 
within 321/320 — e. 

Proof. We connect Theorem0with Corollary0 and obtain an instance of (1,2)- 
ATSP with the property that a tour of length 6 • 42j/ -|- 60iz + 4 ■ 2i/ + u = 
320j7 -|- u corresponds to an assignment satisfying all but u of the equations in 
the Hybrid instance. Since, for any constant e' > 0, it is NP-hard to distinguish 
the cases u < e' and u> 1 — e', it is NP-hard to approximate (1,2)-ATSP within 
321/320 — e for any constant e > 0. 
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4 Conclusions 

It should be possible to improve the reduction by eliminating the vertices that 
connect the equation gadgets iov x + y + z = {Q, 1} with each other. This reduces 
the cost of those equation gadgets by one, which improves our bounds — but only 
by a miniscule amount. The big bottleneck, especially in the (1,2) case, is the 
consistency gadgets. If, for the asymmetric case, we were able to decrease the 
cost of them to four instead of six, we would improve the bound to 237 /236 — e; 
if we could decrease the cost to three, the bound would become 195/194 — e. We 
conjecture that some improvement for the (1,2) case is still possible along these 
lines. 

Acknowledgments. We thank Santosh Vempala for many clarifying discus- 
sions on the subject of this paper. 
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Abstract. Several combinatorial optimization problems can be approx- 
imated using algorithms based on semidefinite programming. In many of 
these algorithms a semidefinite relaxation of the underlying problem is 
solved yielding an optimal vector configuration v\ . . .v„. This vector con- 
figuration is then rounded into a {0, 1} solution. We present a procedure 
called RPB? (Random Projection followed by Randomized Rounding) 
for rounding the solution of such semidefinite programs. We show that 
the random hyperplane rounding technique introduced by Goemans and 
Williamson, and its variant that involves outward rotation are both spe- 
cial cases of RPR^. We illustrate the use of RPR? by presenting two 
applications. For Max-Bisection we improve the approximation ratio. 
For Max-Cut, we improve the tradeoff curve (presented by Zwick) that 
relates the approximation ratio to the size of the maximum cut in a 
graph. 



1 Introduction 

For NP-hard maximization problems, we are interested in polynomial time ap- 
proximation algorithms that for every instance produce solutions whose value 
is guaranteed to be within a ratio of at least a from the value of the optimal 
solution. The parameter 0 < a < 1 is known as the approximation ratio of the 
algorithm, and the larger a is, the better. 

A common method for obtaining an approximation algorithm for a combi- 
natorial optimization problem is based on linear programming: 

1. Formulate the problem as an integer linear program. 

2. Relax the problem to a linear program. 

3. Solve the relaxation in polynomial time, obtaining a fractional solution 

Xi , . . . , Xn- 

4. Round the fractional solution to a 0/1 solution. 

There are several approaches of how to round a fractional solution x\ . . .Xn, and 
the approach to choose depends on the problem. Two common approaches are: 
(a) Threshold rounding in which a threshold t is set and each variable Xi is 
rounded to 1 if Xj > t, and to 0 otherwise (this approach is used for the Vertex 
Cover problem in jHoc82| V (b) Randomized rounding in which a (monotone) 
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rounding function / : i? — >■ [0, 1] is chosen. Each variable Xi is rounded indepen- 
dently to 1 with probability /(xj) and to 0 otherwise (randomized rounding was 
introduced in nrmi . and used for example in the approximation of the Max- 
SAT problem |(1W94| 1. When the rounding function / is a threshold function 
(0 below the threshold and 1 above it), we get threshold rounding as a special 
case of randomized rounding. 

Goemans and Williamson successfully extended this approach to 

semidefinite programming. They use a random hyperplane rounding technique. In 
our presentation below we break this rounding technique into two steps (steps 4 
and 5). 

1. Formulate the problem as an integer quadratic program. 

2. Relax the problem to a semidefinite program. 

3. Solve the relaxation in polynomial time, obtaining a vector solution v\ . . . Vn- 

4. Project the vector solution on a random line through the origin, obtaining 
a fractional solution xi . . . x„. The value Xi is defined to be the (directed) 
distance of the projection of the vector Vi form the origin. 

5. Round the fractional solution xi . . . x„ to a 0/1 solution using threshold 
rounding. (The threshold chosen by Goemans and Williamson is 0, rounding 
vectors with positive projection to 1 and negative projection to 0.) 

Hence both the linear programming approach and the semidefinite program- 
ming approach eventually round a fractional solution to a 0/1 solution. The 
main difference in this respect is that for semidefinite programming threshold 
rounding has always been used at this stage, whereas for linear programming it 
has often been the case that randomized rounding is preferred. 

In this paper we study the use of a randomized rounding procedure instead of 
threshold rounding for the last step of the semidefinite programming approach. 
We call this rounding technique RPR^ {random projection, randomized round- 
ing) for short. The main contribution of this paper is in adding RPR^ to the 
“tool kit” of rounding techniques for semidefinite programs. To achieve this, we 
do several things: (a) We show how the tool can be used. This includes methods 
for choosing the rounding function /, and methods for analyzing (or lower bound- 
ing) the resulting approximation ratio, (b) We identify classes of optimization 
problems for which RPR? has potential of improving the known approximation 
ratios (or possibly even achieving approximation ratios that match the integral- 
ity gap of the semidefinite program), (c) We illustrate the usefulness of RPR? 
by improving the known approximation ratios for some of these problems. 

We now go on to discuss the types of problems for which RPR? may be useful. 
For simplicity and concreteness, we shall concentrate on variants of the Max- 
Gut problem. Given a graph G = (V, E) the Max-Gut problem is the problem 
of finding a maximum cut of G {i.e. a partition of the vertex set V into two 
sets {U, V \ U) that maximizes the number of edges with one end-point in U 
and the other in V \U). Goemans and Williamson |GW95| use the method 
described above (with semidefinite programming and threshold rounding) to 
obtain a partition ([/, V\U) of value at least a ~ 0.87856 times the value of the 
maximum cut in G. Recently in |FSnT| it was shown that this approximation 
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ratio matches the integrality ratio (for the particular semidefinite program used 
by Goemans and Williamson), and hence we shall not try to improve it (at 
least not in this paper) . Instead we shall consider special cases of Max-Cut for 
which threshold rounding (that is, the random hyperplane) produces a solution 
that is clearly not optimal. A simple sufficient condition for a solution not to 
be optimal is that of local optimality. Call a vertex misplaced if most of its 
neighbors lie in the same side of the cut. A solution is locally optimal if it does 
not have misplaced vertices. Clearly, a solution that is not locally optimal can 
be improved by having misplaced vertices change sides. For some instances of 
Max-Cut, it can be shown that if the approximation ratio of the Goemans and 
Williamson algorithm is indeed as bad as a ~ 0.87856, then necessarily the 
solution produced has (a substantial number of) misplaced vertices. Hence the 
approximation ratio can be improved by adding a local correction step to the 
algorithm that moves vertices from side to side until the solution becomes locally 
optimal. The questions that remain are how to best guide the local correction 
step in its choices, and how to analyze the effect of the local correction step. In 
some cases, RPE? answers both questions simultaneously. 

Consider light Max-Cut, the problem of Max-Cut on instances where the 
maximum cut in a graph is not very large (below a 0.844 fraction of the edges 
in the graph). For this case Zwick | |ZwiOO| showed how to obtain approximation 
ratios above a ~ 0.87856, using the tool of outward rotation. We propose to 
use RPR^ instead. We observe that for instances of light Max-Cut for which 
threshold rounding is at its worse, there are misplaced vertices. (This may not 
be obvious to the reader at this point, but is true nevertheless.) Hence necessarily 
the approximation ratio can be improved, and the only question is by how much. 
By a suitable choice of a rounding function /, we can use the RPR? technique 
to give solutions that are locally optimal, and moreover, we can lower bound 
the approximation ratio that is obtained. As we shall see, this approximation 
ratio is better than the one obtained by Zwick using outward rotations. This is 
no coincidence, because as we shall show, outward rotation can be viewed as a 
special case of RPR?, but with a choice of rounding function / that produces a 
solution that is not locally optimal. 

The use of RPR^ above can be viewed as using a threshold scheme (random 
hyperplane) followed by a randomized local correction step (in the randomized 
rounding phase vertices change sides with probabilities related to their distance 
from the hyperplane). Hence the choice of rounding function / guides the local 
correction step, and the RPR? methodology gives us a way of quantifying the 
effect of using a local correction step. Moreover, it is straightforward to deran- 
domize the local correction step (using the method of conditional probabilities) 
giving a local correction step that is deterministic. 

It is fair to remark that not in all cases it is advantageous to use the RPR^ 
approach in order to guide and analyze local corrections. For example, local 
corrections were successfully used in [FKIjflfl] to improve the approximation ratio 
for Max-Cut for graphs of bounded degree. We do not think that the analysis 
of p^'KLOOj can be cast in the terminology of RPR? . 
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Another class of problems for which RPB? may be useful is cases where 
threshold rounding produces infeasible solutions. Given a graph G = {V, E) the 
Max-Bisection problem is the problem of finding a partition {U, V \U) of V 
into two equally sized sets {i.e. a bisection of V) that maximizes the number of 
edges cut by the partition. The algorithm of |GW95| (described above) for the 
Max-Cut problem on G will probably yield a partition {U, V \ U) which is not 
a bisection {i.e. \U\ yf |M \ U\). Hence, in order to obtain a feasible partition of 
V {i.e. a bisection), the partition {U, V\U) must be modified (e.g., by moving 
vertices from the large side of the partition to the smaller one until both sides 
are equal). It is very difficult to analyze the effect of this additional step. There 
has been a sequence of papers |l'.ID7pYe!lD|H/l)nj , each improving the bounds of 
the previous papers. We observe that RPR^ is a natural rounding technique to 
use in this context, because by an appropriate choice of the rounding function 
/ (possibly, based on the outcome of the random projection), we can guarantee 
that the two sides are of (nearly) the same size. We show a particular choice of 
rounding function / that modestly improves the known approximation ratio for 
Max-Bisection. We suspect that there are choices of / that give more dramatic 
improvements, though we are still struggling with their analysis. 



A function / is called s-linear if for some s > 0 it is of the form f{x) = 0 for 
X < —s, f{x) = 1 for a: > s, and f{x) = | -I- ^ for — s < a; < s. As concrete 
examples of our results, we have the following theorem: 



Theorem 1. Using RPR? with an s-linear rounding function f , one obtains 
the following approximation ratios. 



- For light Max-Cut for instances in which the optimal cut contains at most 
0.6 of the edges, the ratio is at least 0.9128. (Previous best bound was be- 
low 0.9119.) 

- For Max-Bisection the ratio is at least 0.7027. (Previous best bound was 
below 0.7017.) 

Our paper is structured as follows. In Section|^we review the random hyper- 
plane and outward rotation rounding techniques. In Section 0 we define RPF? 
and show that outward rotation is a special case of RPR^. In Section 0 we an- 
alyze the use of RPR^ on the Max-Cut and Max-Bisection problems. Finally, 
in Section 0 we offer some concluding remarks. Due to space limitations, the 
results of our work are presented without detailed proof. In most cases, jFli)!) 
(the extended version of our work) contains the missing details. 



2 SDP Relaxation of Max-Cut and Various Roundings 

Consider the Max-Cut problem on a graph G = {V, E) with \V\ = n. It can be 
represented as a quadratic integer program: 

(QI-MC) Maximize ^ 

subject to: 



Xi £ {-1, 1} 



for 1 < i < n 
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The above program can be understood as follows. With each vertex f G F we 
associate a variable Xi, and the value of Xi (which is either +1 or — 1 ) indicates 
in which side of the cut the respective vertex is placed. For each edge eij G E, if 
Xi ^ Xj (corresponding to the case in which is cut) then the value of ^ 
is 1 , and if xi = Xj then this value is 0 . 

The requirement Xi G { — 1,1} can be relaxed by representing each variable 
Xi by a unit n-dimensional vector Vi G S'” (here S” is the unit sphere) and the 
multiplication Xi ■ Xj by the inner product {vi,Vj). 

(SDP-MC) Maximize 

subject to: 

Ui G S” for 1 < i < |F| 



As every solution of (QI-MC) is also a solution of (SDP-MC), the value of 
(SDP-MC) is at least as large as that of (QI-MC). (SDP-MC) can be solved 
(up to arbitrary precision) in polynomial time using semidefinite programming 
(see |GW9,5j l. A solution to (SDP-MC) is a set of unit vectors in i?”, rather 
than a cut of G. To obtain a cut ({7, V \ U) oi G we round the set of optimal 
vectors vi . . .Vn obtained by solving (SDP-MC). One such rounding technique, 
presented by Goemans and Williamson mm is the random hyperplane round- 
ing technique. 



Let r = ri . . . be a random variable with an n dimensional standard nor- 
mal distribution {i.e. each coordinate is an independent random variable with 
standard normal distribution). It can be seen that r is spherically symmetric, 
namely the direction specified by the vector r G i?" is uniformly distributed (see 
for instance Ei n m)- In the random hyperplane rounding technique a random 
vector r of the above distribution is chosen and the vectors v\ . . .Vn are parti- 
tioned into two sets according to the sign of the inner product {vi, r). That is, a 
cut {U, V \ U) is defined by the set U = {i\ {vi, r) > 0 } . 

Using the semidefinite program (SDP-MC) and the random hyperplane 
rounding technique, ITTWr^ obtain a 0.87856 approximation ratio for the Max- 
Cut problem. A number of other approximation algorithms for various problems 
have been designed using semidefinite programming and variations of the random 
hyperplane rounding technique (for example li:‘G95IKlVI!898IFJ97IK/97l/wi9f?l . 
In some of these algorithms, the vectors v\ . . .v„ 



obtained by 

solving a semidefinite relaxation are rearranged prior to the use of random hy- 
perplane rounding. 

One method used to rearrange the vectors v\ . . .Vn is outward rotations 
[INes98IYe99IZwi9^ . Let 7 G [0, 1]. Given a set of vectors Vi . . .Vn in i?", ob- 
tained by the solution of a semidefinite program, the 7 -outward rotation of 
vi . . .Vn are a set of new vectors v\ .. .in in 7?^". The vector Vi is defined to 
be \/l — 7 Ui -|- y/jCi G where the original vectors Vi are viewed as vectors in 
j^ 2 n jjgjjjg g. subspace of i?^”) and ei . . . e„ are a set of orthonormal vectors 
in that are also orthogonal to the vectors Vi . . .Vn. In general, when 7=1 
the 7 -outward rotation of the vectors f 1 . . . is a new vector configuration in 
which all vectors are orthogonal, and when 7 = 0 the 7 -outward rotation does 
not change the vectors v\...Vn. For intermediate 7 , the 7 -outward rotation is 
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somewhere in between. Outward rotation has been used in the design of ap- 
proximation algorithms for special instances of the Max-Cut problem, and other 
problems closely related to Max-Cut (for example |/wi9hlYe99lhL/()H| ~). 

3 Random Projection, Randomized Rounding {RPR?) 

Let vi . . .Vn he a, set of vectors obtained by the solution of a semidefinite relax- 
ation. We define a family of rounding procedures parameterized by a function 
/ : i? — > [0, 1]. We denote this family of rounding procedures as the random 
projection, randomized rounding {RPB?) family. An RPR^ procedure using / 
has two steps and is defined as: 

Step 1 (Projection): Project the vectors v\ . . .Vn onto a random one dimen- 
sional subspace {i.e. a line). This is done by choosing a random variable r with 
n dimensional standard normal distribution, and projecting each Vi onto the one 
dimensional subspace containing r. For each i, let Xi be the directed distance 
(times ||r||) of the projected vector Vi from the origin {i.e. Xi = (vi,r)). 

Step 2 (Randomized rounding): Define the {0, 1} solution oi . . . a„: for each 
i set Oi to be 1 independently with probability f{xi). 

The standard random hyperplane rounding technique presented in !rTWT7?;| 
in the approximation of the Max-Cut problem is a member of the RPR? family. 
The function / corresponding to random hyperplane rounding is the function 
which is 1 for all a: > 0, and zero otherwise. Later in this section we show that the 
outward rotation rounding technique is also a special case of RPR? . In Section 0] 
we study the use of RPR? on the “light” Max-Cut problem and on the Max- 
Bisection problem. For both these problems outward rotation was used in order 
to obtain the previously best approximation ratios. We show functions / which 
when used in RPR^ give better approximation ratios. 

Analyzing RPR?-. Let v\ . . .Vn be the solution of a semidefinite relaxation on 
a given graph G = (V,E), and let oi . . . a„ be the {0,1} solution obtained by 
using RPR? with some function /. An edge is cut if Oi yf aj. As RPR^ is a 
randomized procedure, the number of edges cut is a random variable. We wish 
to compute its expectation. For this, we analyze the probability of the event 
a, yf aj . 

Let r = Ti . . . r„ be an n dimensional standard normal vector. In general, 
given r, the probability that “oi y^ a/’ depends on the vectors v,, Vj, and the 
function /. Hence, integrating over all possible r one can compute the probability 
of this event. However, as r is spherically symmetric this probability can be 
computed using two independent standard normal random variables ri, r 2 , and 
the angle between Vi and Vj alone. Given two vectors Vi and Vj that form an 
angle of Oij, let Pf{0ij) denote the probability that the corresponding values 
Oi and Oj differ. Let 4>{x) = be the density function of a standard 

normal random variable. The following lemma is straightforward (details appear 
in |FII)1| L 
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Lemma 1. Let 9 G [ 0 , 7 r], z{ri,r 2 ) = cos(0)ri +sin( 0 )r 2 . 

/ OO POO 

/ [/(^i)(l - f{z{ri,r 2 ))) + /( 2 (ri, r 2 ))(l - /(ri))]</)(ri)(^(r 2 )dridr 2 

•OO — OO 



By linearity of expectation, the expected number of edges cut is g-E ^/(%)> 
where 9ij is the angle formed by the vectors Vi and Vj corresponding to e^. 
Dividing this value by \E\ we get the expected fraction of edges cut, which we 
denote by E[Cutf]. 

For an edge that makes an angle of 9, let SDP{9) = (1 — cos0)/2 be its 
contribution to the semidefinite program. Then a convenient lower bound on the 
approximation ratio achieved for Max-Cut by RPR^ with a rounding function / 
is mine>o Pf{9) / SDP{9). For light Max-Cut, this lower bound is too pessimistic. 
The angle 9 minimizing the above expression cannot hold simultaneously for all 
edges, because then the graph would contain a cut that is too large. A stronger 
lower bound on the approximation ratio can be derived by more detailed anal- 
ysis, following principles outlined in |/wi For Max-Bisection, there are more 
complications, because the cut obtained by the rounding technique is not nec- 
essarily a bisection. An additional step of moving vertices from the larger side 
to the smaller one is used, and analyzing its effect (or at least, providing lower 
bounds), can be done using the techniques outlined in [F.I97fYe99|HZl)l)j . The 
numerical bounds for the approximation ratios that are presented in this paper 
were derived by using analysis that follows the principles developed in [FWl 
])• 






Outward rotation is a special case of RPR?-. Let be as above. Let 7 

be some value in [0, 1]. Recall that the 7 -outward rotation vi ... Vn oi the vectors 
vi .. .Vn is defined by Vi = y/1 — -yvi + G . 

In the standard use of outward rotation, a {0, 1} solution ai . . . a„ is ob- 
tained by rounding the vectors vi . . .Vn by a random hyperplane. Specifically 
let r = ri . . . r2n be a random vector with a 2 n-dimensional standard normal 
distribution. Define the solution ai . . . a„ by setting to be one iff the inner 
product {vi,r) is positive. It is convenient to describe the solution ci . . . a„ as 
the subset 17 = {f G R | Oi = 1} of R, i.e. U = {i & V \ (vi,r) > 0}. Using the 
definition of Vi and the spherical symmetry of r, we have that the set U obtained 
is equal to {f G U | y/T^^{v, ri . . . r„) -|- ^r^+i > 0}. 

We would like to obtain the exact set U without the use of outward ro- 
tations. Instead we would like to use RPR^. Let cj){x) = and 

(p(x)dx be the density function and distribution function of a stan- 
dard normal random variable. We obtain the following theorem (detailed proof 
can be found in IFLOll l. 



Theorem 2. For any 7 G [0, 1], let f.y = <F ■ Using RPR^ with is 

equivalent to -y -outward rotation followed by random hyperplane rounding. 



In cases where outward rotation is used, it is natural to ask whether RPR? in 
combination with a different rounding function / can give better approximation 
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ratios. It is our belief that the answer to this question is in general positive. 
That is, whenever outward rotation gives better approximation ratios than the 
random hyperplane rounding technique, one should expect RPR^ to offer further 
improvements . 

Let us note that for RPR? ^ if we restrict ourselves to nice behaving rounding 
functions, finding the optimal rounding function / is not really a problem. For 
any fixed e > 0, there is a constant number of functions / (where this constant 
depends exponentially on 1/e) such that at least one of them has expected ap- 
proximation ratio within an additive error of at most e from the optimal /. (This 
can be shown using concepts such as e-nets.) An RPR? algorithm can even try 
out all these functions at run time and take the best result obtained. Hence, we 
may always assume that RPR? is performed with the best possible rounding 
function. The problem is in analyzing the approximation ratio that one obtains. 
Here it is useful to select one particular easy to analyze rounding function /, 
to compute the expected approximation ratio that this / gives, and use it as a 
lower bound on the approximation ratio of the RPR^ scheme. In this respect, 
outward rotations are helpful, as they can be analyzed not only using the inte- 
grals of Lemma E but also via other techniques (as in M), and these 

other techniques are often simpler to use. It would be fair to say that previous 
work on outward rotation served as inspiration to much of the work reported in 
the current paper. 



4 Applications of RPR? 

Light Max- Cut: 

Let G = (V, E) be a given graph, let v\ . . .Vnhe the optimal vector configuration 
obtained by solving the semidefinite relaxation (SDP-MC) presented in f(IW95] 
(and in Section n of G, and let Z be the value of this relaxation. Can RPR? 
(with some specific /) be used on the Max-Cut problem in order to improve the 
approximation ratio of a ~ 0.87856 proved in pCW95| ? If we define a to be the 
ratio between the expected value of the cut obtained using RPR^, and the value 
of the semidefinite relaxation Z, the answer is negative. This is due to a recent 
work of Feige and Schechtman IFTMI that shows that the integrality gap of this 
relaxation is arbitrarily close to 1/a. Therefore, we will not try to improve the 
approximation ratio a on general instances G. Instead we shall consider special 
cases of Max-Cut. 

Consider parameterizing the instances of Max-Cut according to the ratio 
between the value of the semidefinite relaxation Z and the total number of edges 
W. Goemans and Williamson [HW9^ study Max-Cut restricted to instances 
G for which this ratio is greater that 0.844. For each value t G (0.844, 1], they 
show that using standard random hyperplane rounding on instances G for which 
Z = tW, will yield a cut of value at least atZ where at > a (this implies an 
approximation ratio of at). In it is shown that the integrality gap of 

(SDP-MC) on these restricted instances is arbitrarily close to l/a*. Therefore, 
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we will not try to improve the algorithm of restricted on such instances 

either. 

Zwick IZwi studies Max-Cut restricted to instances G for which Z = tW 
and t < 0.844. We call this “light” Max-Cut. For these instances Zwick shows 
how to obtain approximation ratios at > a, using outward rotation followed 
by random hyperplane rounding. The value of the integrality gap (SDP-MC) 
on these restricted instances is not clear. We analyze the use of RPE? on light 
Max-Cut. Roughly speaking, we show (a) Necessary conditions for a rounding 
function / to be the one that maximizes the expected value of the cut obtained by 
RPE? . (b) Outward rotation (a special case of RPR^) is not the best rounding 
function for RPR?, as it does not satisfy these necessary conditions, (c) We 
present an s-linear rounding function that gives an approximation ratio strictly 
above the ratio of at presented in [Ewi99| . We suspect that RPR^ (with the 
optimal choice of /) achieves an approximation ratio that matches the inverse 
of the integrality gap of (SDP-MC) (as a function of t). We are trying to extend 
the techniques of in order to prove this. 

Our analysis involves the numerical evaluation of double integrals (presented 
in Lemma 0. These evaluations have been performed using MATLAB functions 
within precision of 10“®. As such computations are time consuming, item (b) 
and (c) above are shown for a few values of t (namely t = 0.55,0.6,0.7). 

Properties of the best function / for RPR?-. Given a graph G = (V, E) 
and a set of vectors v\ . . .Vn obtained by solving (SDP-MC) on G, let E[Cutf] 
be the expected fraction of edges cut by using RPR? with a function /. Call a 
function well behaved if it is piecewise continuous. We are interested in finding 
a well behaved function /* that maximizes E[Cutf]. 

For light Max-Cut we identify a necessary condition for any well behaved 
function / that maximizes E[Cutf\. We use this necessary condition to prove 
that the functions / corresponding to outward rotation are not optimal. More- 
over, this necessary condition helps to guide us in finding a better rounding 
function (in our case, an s-linear function), without resorting to a tedious ex- 
haustive search type approach for such a function (as mentioned in Section EJ- 

A natural property that one could expect from an optimal function f* is that 
rounding the vectors Vi . . .Vn using RPR"^ with /* yields a cut {U,V\U) which 
is expected to be locally optimal {i.e. there are no vertices with an expected 
majority of neighbors on their side of the cut). For instance for the function 
f{x) = 1/2 this property holds. The necessary condition we suggest is closely 
related to this “local optimality” property. 

Let G = (V, E) be a given graph and ui . . . be the set of vectors obtained 
by solving (SDP-MC) on G. Let / be some RPR? function. Recall that Pf{9) 
(defined in LemmaQ]) measures the probability that using RPR? with the func- 
tion /, two vectors Vi and Vj that form an angle of 9 have corresponding values 
at and aj that differ. Let E[Cutf] be the expected fraction of edges cut by us- 
ing RPR^ with /. That is, E\Cwtf\ is a normalized sum of Pf{9ij), where 9tj 
is the angle between the vectors corresponding to edges in E. Consider the 
probability Pf{9ij) conditioned on the event that the inner product between Vi 
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and the random vector r used in RPB? is fixed to be a specific value ri. We 
denote this probability as Pf{9ij)\ri. Define E[Cutf \ ri] as the corresponding 
normalized sum of Pf{9ij)\ri. 

Theorem 3. If f* is an optimal (well behaved) RPB? function then for all ri we 
have that E[Cutf* \ ri] > 1/2 with equality ifO < f{ri) < 1 (i.e. /(ri) ^ {0,1}). 

Proof of the above theorem appears in pSH- We would like to note that our 
proof is done in a constructive manner. That is, if / is some RPR^ function 
that does not satisfy the above conditions in some interval A, we show how to 
modify / in Z\ to obtain a new function f* such that E[Cutf*] > E[Cutf], thus 
implying that / is not optimal. 

Outward rotation is not the best function for RPR?-. For an instance 
G = {V,E) of Max-Cut, let W = |if|, let v\. . .Vn be the vector configuration 
obtained by solving relaxation (SDP-MC) on G and let Z be the value of the 
relaxation. Let t G [0.5, 0.844). Assume a graph G = {V, E), with a corresponding 
vector configuration of value Z = tW. In |Zwi it is shown that 

rounding ui . . . using y-outward rotation, an expected approximation ratio 
strictly above at will be obtained unless for each edge Cij in E, the corresponding 
vectors Vi and Vj form an angle of either zero or 9t (where 9t is some specific 
angle greater than 7t/2 that depends on t). 

In other words, only on graphs G = (V, E) with a corresponding vector con- 
figuration in which a S fraction of edges Cij in E have corresponding vectors Vi 
and Vj that form an angle of zero, and a 1 — <5 fraction of edges have correspond- 
ing vectors that form an angle of 9t, does the algorithm of |/wi!l!l] obtain an 
approximation ratio of no better than a*. On all other graphs the algorithm of 
[Ewi99| has an approximation ratio strictly greater than at. 

Let fj be the RPR? function corresponding to y-outward rotation. It can 
be seen (inni) that for such worst case graphs there exists a non negligible 
interval A C R, and a constant £ > 0 such that E[Cutf^ | ri] < 1/2 — £ for all 
ri £ A (for instance for t = 0.6 we have E[Gutf^ \ r? < 0.493 for ri G [0.3, 0.4]). 
By a quantitative version of Theorem 0 we may construct a new function /* by 
modifying in the interval A such that E[Gutf] > E[Gutf^\ +poly{e). We 
conclude 

Theorem 4. There exists a constant £ > 0, such that using RPR^ on the worst 
case graphs of }Zwi9fJ^ an approximation ratio of at + e can be obtained. 

This implies an improved approximation algorithm for Max-Cut on general in- 
stances G with Z = tW. If the given graph has a vector configuration close 
to the worst case configuration use the best function for RPR?, otherwise use 
the original algorithm of !Zwi99j (we rely on the fact that RPR^ rounding is 
continuous with respect to the vector configuration vi . . .u„). 

As noted previously, our analysis involves the numerical evaluation of inte- 
grals, thus the above theorem has been proven for t = 0.55, 0.6, 0.7. We have no 
reason to believe that our results depend on these particular values of t. 

An example for superior RPR? functions: We have shown that an ap- 
proximation ratio greater than at can be obtained on graphs G with Z = tW 
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by improving the approximation ratio obtained on worst case graphs of j/wi nni. 
Following we show that such an improvement can be proven directly. That is, 
given a value t and a graph G with Z = tW , we are interested in proving a lower 
bound on the expected value of the cut obtained by RPE? . This can be done by 
choosing some function /, and analyzing the value of Pf{0) for every 9 G [0,7r]. 

Let s be some threshold, recall that an s-linear function /* is the continu- 
ous function that is zero for all x < —s, one for all x > s, and linear for all 
X G (— s, s). By replacing the function that corresponds to outward rotation, 
by an s-linear function /* the following approximation ratios were achieved. For 
t = 0.55 and s = 0.96 a ratio of 0.942562 (as opposed to 0.941282 of |Zwi99j ). 
for t = 0.6 and s = 0.635 a ratio of 0.912809 (as opposed to 0.911890 of |Zwi99j ). 
and for t = 0.7 and s = 0.263 a ratio of 0.886453 (as opposed to 0.886251 of 
[Kwi99| ). The functions /* are not claimed to be optimal, but as we have checked 
many different functions, we believe that they are close to being so. Hence, it 
seems that the original functions corresponding to outward rotation are very 
close to being optimal. 

Max-Bisection: 

Given a graph G = {V,E) the Max-Bisection problem is the problem of finding 
a partition {U,V\ U) of V into two equally sized sets (i.e. a bisection of V) that 
maximizes the number of edges cut by the partition. A number of approximation 
algorithms for Max-Bisection based on semidefinite programming have been sug- 
gested jF,)97IYe99IHZflO] . yielding approximation ratios of 0.6514, 0.699, 0.7016 
respectively. In these algorithms, a semidefinite relaxation of Max-Bisection is 
solved yielding a set of vectors v\ . . .Vn- These vectors are then rounded (using 
the random hyperplane or outward rotation technique) in order to obtain a cut 
([/, V\U) of G. This cut is not necessarily a bisection, thus the cut {U, V\U) is 
modified by moving vertices from the large side of the cut to the smaller side 
until both sides are equal. As in the case of Max-Cut, we analyze the use of 
RPE? in the algorithm above and conclude the following theorem (our analysis 
is based on that presented in [HZOOj l. 

Theorem 5. Using RPR? with a 0.605-linear rounding function, Max- 
Bisection can be approximated within an approximation ratio of 0.7027. 

5 Conclusions 

Many questions remain open, but seem within reach. For “light” Max Cut, we 
suspect that RPR^ (with the optimal choice of /) achieves an approximation 
ratio that matches the integrality ratio (as a function of the relative size of the 
maximum cut). We are trying to extend the techniques of |F??nTj in order to 
prove this. For Max Bisection, we suspect that more substantial improvements 
of the approximation ratio can be proven for other choices of rounding function 
/. For some other problems, especially those currently analyzed using outward 
rotation (such as Not-All-Equal-3SAT jZwi m) , it is natural to assume that the 
approximation ratio can be improved using RPR^, but this needs to be seen. 
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Abstract. We study the generalization of covering problems to partial 
covering. Here we wish to cover only a desired number of elements, 
rather than covering all elements as in standard covering problems. For 
example, in fe-set cover, we wish to choose a minimum number of sets 
to cover at least k elements. For fc-set cover, if each element occurs in 
at most / sets, then we derive a primal-dual /-approximation algorithm 
(thus implying a 2-approximation for fc-vertex cover) in polynomial 
time. In addition to its simplicity, this algorithm has the advantage 
of being parallelizable. For instances where each set has cardinality 
at most three, we obtain an approximation of 4/3. We also present 
better-than- 2-approximation algorithms for fe-vertex cover on bounded 
degree graphs, and for vertex cover on expanders of bounded average 
degree. We obtain a polynomial-time approximation scheme for fc-vertex 
cover on planar graphs, and for covering points in by disks. 

Keywords and Phrases: Approximation algorithms, partial covering, 
set cover, vertex cover, primal-dual methods, randomized rounding. 



1 Introduction 

Covering problems are widely studied in discrete optimization: basically, these 
problems involve picking a least-cost collection of sets to cover elements. Classi- 
cal problems in this framework include the general set cover problem, of which a 
widely studied special case is the vertex cover problem. (The vertex cover prob- 
lem is a special case of set cover in which the edges correspond to elements and 
vertices correspond to sets; in this set cover instance, each element is in exactly 
two sets.) Both these problems are NP-hard and polynomial-time approximation 
algorithms for both are well studied. For set cover see mm- For vertex cover 
see |3l4iyii8liytjV| . 
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In this paper we study the generalization of “covering” to “partial covering” 
HIHI. Specifically, in fc-set cover, we wish to find a minimum number (or, in 
the weighted version, a minimum weight collection) of sets that cover at least 
k elements. When k is the total number of elements, we obtain the regular set 
cover problem; similarly for /c-vertex cover. (We sometimes refer to fc-set cover as 
“partial set cover” , and fc- vertex cover as “partial vertex cover” ; the case where 
fc equals the total number of elements is referred to as “full coverage”.) This 
generalization is motivated by the fact that real data (in clustering for example) 
often has errors (also called outliers). Thus, discarding the (small) number of 
constraints posed by such errors/outliers is permissible. Suppose we need to 
build facilities to provide service within a fixed radius to a certain fraction of 
the population. We can model this as a partial set cover problem. The main 
issue in partial covering is: which fc elements should we choose to cover? If such 
a choice can be made judiciously, we can then invoke a set cover algorithm. Other 
facility location problems have recently been studied in this context [Z]. 

Regarding vertex cover, a very simple approximation algorithm for the un- 
weighted case is attributed to Gavril and Yannakakis, and can be found, e.g., 
in nm: take a maximal matching and pick all the matched vertices as part of 
the cover. The size of the matching (number of edges) is a lower bound on the 
optimal vertex cover, and this yields a 2-approximation. This algorithm fails for 
partial covering, since the lower bound relies on the fact that all the edges have 
to be covered: in general, approximation algorithms for vertex cover may return 
solutions that are much larger than the optimal value of a given fc-vertex cover 
instance. The first approximation algorithm for fc-vertex cover was given in pj. 
Their 2-approximation algorithm is based on a linear programming (LP) for- 
mulation: suitably modifying and rounding the LP’s optimal solution. A faster 
approximation algorithm achieving the same factor of 2 was given in E2; here, 
the key idea is to relax the constraint limiting the number of uncovered elements 
and searching for the dual penalty value. More recently, a 2-approximation based 
on the elegant “local ratio” method was given in jSj. 

Problem Definitions and Previous Work 

fc-Set Cover: Given a set T = {ti,t2, ■ ■ ■ ,tn}, a collection S of subsets of T, 
S = {Si, S2, ■ ■ ■ , Sm}, a cost function c : 5 — >■ , and an integer fc, find a 

minimum cost sub-collection of S that covers at least fc elements of T. 

For the full coverage version, a In n-l-1 approximation was proposed in vmm . 
This analysis of the greedy algorithm can be improved to H{A) (see the proof 
in ^) where A is the size of the largest set. (H{k) = 1/* = lnfc-l- i9(l).) 

Ghvatal [S| generalized this to the case when sets have costs. Slavik I2nj shows 
the same bound for the partial cover problem. When Z\ = 3, Duh and Fiirer 
UK gave a 4/3-approximation for the full coverage version. They extended this 
result to get a bound of H{A) — | for full coverage. When an element belongs 
to at most / sets Hochbaum m gives a /-approximation. 

fc- Vertex Cover: Given a graph G = (V,E), a cost function c : V — >■ Q+, and 
an integer fc, find a minimum cost subset of V that covers at least fc edges of G. 

Several 2-approximation algorithms are known for this; see WM- 
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Geometric Covering Problem: Given n points in a plane, find a minimally 
sized set of disks of diameter D that would cover at least k points. 

Previous Results: The full coverage version is well-studied. This problem is 
motivated by the location of emergency facilities as well as from image processing 
(see PD for additional references). For the special case of geometric covering 
problems, a polynomial-time approximation scheme is shown in m 

Our Results 

Full proofs of the claims in this paper are given in m- 

fc-Set Cover: For the special case when each element is in at most / sets, we 
combine a primal-dual algorithm |nrn] with a thresholding method to obtain an 
/-approximation. One advantage of our method, in addition to its simplicity, is 
that it can be easily parallelized by changing the algorithm slightly. The resulting 
approximation factor is /(l-|-e), where e > 0 is any desired constant. The number 
of parallel rounds is O(logn) once we fix e > 0. The number of processors 
required in linear in the problem size. This is the first parallel approximation 
algorithm for any partial covering problem. For set cover where the sets have 
cardinality at most A there are results (starting from pSEEl) by Duh and Fiirer 
im for set cover (full coverage) that improve the H{A) bound to H{A) — |. 
For example, for A = 3 they present a | (= H{3) — ^) approximation using 
“semi-local” optimization rather than a ^-approximation obtained by the simple 
greedy algorithm. For the case Z\ = 3, we can obtain a | bound for the partial 
coverage case. 

/c- Vertex Cover: By switching to a probabilistic approach to rounding the LP 
relaxation of the problem, we obtain improved results for fc-vertex cover, where 
we wish to choose a minimum number of vertices to cover at least k edges. 
An outstanding open question for vertex cover (full coverage) is whether the 
approximation ratio of 2 is best-possible; see, e.g., m Thus, it has been an 
issue of much interest to identify families of graphs for which constant-factor 
approximations better than 2 (which we denote by Property (P)) are possible. 
In the full coverage case. Property (P) is true for graphs of bounded maximum 
degree; see, e.g., m- How can we extend such a result? Could Property (P) hold 
for graphs of constant average degree? This is probably not the case, since this 
can be shown to imply Property (P) for all graphs. As a step toward seeing 
which graph families of constant average degree enjoy property (P), we show 
that for expander graphs of bounded average degree. Property (P) is true. We 
also show Property (P) for fc- vertex cover in the case of bounded maximum 
degree and arbitrary k; this is the first Property (P) result for fc-vertex cover, to 
our knowledge. We also present certain new results for multi-criteria versions 
of fc- vertex cover. 

Geometric Covering: There is a polynomial approximation scheme based on 
dynamic programming for the full coverage version m For the partial coverage 
version since we do not know which k points to cover, we have to define a new 
dynamic program. This makes the implementation of the approximation scheme 
due to m more complex, although it is still a polynomial-time algorithm. 
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fc- Vertex Cover for Planar Graphs: We are able to use the dynamic program- 
ming ideas developed for the geometric covering problem to design a polynomial- 
time approximation scheme (PTAS) for /c-vertex cover for planar graphs. This is 
based on Baker’s method for the full covering case P|. The details are omitted 
from this extended abstract; the interested reader is referred to m 

2 fc-Set Cover 

The k-Set Cover problem can be formulated as an integer program as follows. We 
assign a binary variable Xj for each Sj S S i.e Xj G {0, 1}. In this formulation, 
Xj = 1 iff set Sj belongs to the cover. A binary variable yi is assigned to each 
element ti G T- yi = 1 iff ti is not covered. Clearly, there could be at most 
n — k such uncovered elements. An LP relaxation is obtained by letting the 
variables be reals in [0,1]. The LP is to minimize subject to: 

(i) y^ + Y.y.u^Si ^3 ^ * = 1.2, ...,n; (ii) < n - k; (iii) Xj > 0, 

j = 1, 2, . . . , to; and (iv) yi > 0, i = 1,2, ... ,n. The dual LP contains a variable 
Ui (for each element ti G T) corresponding to each of the first n constraints in 
the above LP. The dual variable z corresponds to the {n+ 1)*^ constraint in the 
above LP formulation. The dual LP is to maximize X)r=i Ui~ {n — k) ■ z subject 
to: (i) i “ 1, 2, . . . , TO, (ii) 0 < iti < z for i = 1, 2, . . . , n, 

and (iii) z > 0. 

The algorithm SetCover does the following. The algorithm “guesses” the 
set with the highest cost in the optimal solution by considering each set in turn 
to be the highest cost set. For each set that is chosen, to be the highest cost set, 
say Sj, Sj along with all the elements it contains is removed from the instance 
and is included as part of the cover for this guess of the highest cost set. The 
cost of all sets having a higher cost than c(Sj) is raised to oo. Ij = ,S\c' , kj) 

is the modified instance. SetCover then calls Primal-Dual on Ij which uses 
a primal dual approach HS| to return a set cover for Ij. In Primal-Dual, the 
dual variables Ui are increased for all ti G until there exists a set Si such 
that Xii-t eS- ~ c'(S'i). Sets are chosen this way until the cover is feasible. 
The algorithm then chooses the minimum cost solution among the to solutions 
found. The pseudo-code for this algorithm can be found in m- 

Theorem 1. SetCover(T, 5, c, fc) returns a /-approximate solution, where f 
is the highest frequency of any element i.e. an element appears in at most f sets. 

Corollary 1. SetCover(A, V, c, fc) gives a 2-approximate solution for k-Vertex 
Cover. 



2.1 Parallel Implementation of Partial Set Cover Algorithm 

We assume as before that each element belongs to at most / sets. The frame- 
work for the algorithm is the same as the one we described for the primal- 
dual serial algorithm. The parallel algorithm runs in “rounds”. In each round. 
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we simultaneously raise all dual variables Ui corresponding to the uncovered 
elements. In the serial algorithm we pick one set in each iteration, namely 
a set Sj such that (XlctiGSj “ c'(S'j)). (Recall that c' denotes the modi- 
fied cost function.) We change this step in the algorithm to pick all sets such 
that (c'{Sj) — (This condition will let us prove that 

c'{Sj) < {J2i-t es- ^ 0/(1 ~ ^)0 We stop as soon as we have covered at least 
k elements. Suppose the algorithm covers at least k elements after £ rounds. The 
main problem is that in the last round we can include many sets simultaneously, 
while we can afford to include only a few. Let <5 be the number of elements that 
we need to cover after round £—1. To select an appropriate subset of the chosen 
sets, we need to pick a minimal collection of chosen sets that cover at least S 
elements. To accomplish this, we order the sets chosen in the last iteration ar- 
bitrarily. Now compute in parallel the “effective” number of elements each set 
covers and choose a minimal collection based on the fixed ordering. (All these 
steps can be implemented in parallel using prefix computations.) 

Theorem 2. The parallel algorithm runs in (1 -I- /log(l/e))(l -I- logn) rounds, 
with eaeh round running in O(logn) time; the number of proeessors is linear in 
the size of the input. The algorithm produces an approximate solution. 

3 Set Cover for Small Sets 

Problem: Given a collection C of small subsets of a base set U . Each small 
subset in the collection has size at most A, and their union is U. The objective 
is to find a minimum size sub-collection that covers at least k elements. 

Here we have the original partial set cover instance with the additional infor- 
mation that the sets are of “small” size, i.e., A is small. We obtain an approxima- 
tion factor of 4/3 for the case when Z\ = 3 using the the idea of (s, f) semi-local 
optimization HH. This technique consists of inserting up to s 3-sets (sets of size 
3) and deleting up to t 3-sets from the current cover. Then the elements that 
are not covered by the 3-sets (already existing ones -I- the newly added) are 
covered optimally using 2-sets and 1-sets. This can be solved in polynomial time 
using maximum matching [ig. The vertices are the uncovered elements of U and 
the edges are the admissible 2-sets. The 2-sets corresponding to the maximum 
matching edges and the 1-sets corresponding to the vertices not covered by the 
maximum matching form an optimum covering. We will order the quality of a 
solution by the number of sets in the cover and among two covers of the same 
size we choose the one with fewer 1-sets and if the covers have the same size and 
neither cover has a 1-set we choose the one that covers more elements. 

The algorithm starts with any solution. One solution can be obtained as 
follows. Choose a maximal collection of disjoint 3-sets. Cover the remaining ele- 
ments optimally using 2-sets and 1-sets. Perform semi-local (2, 1) improvements 
until no improvement is possible. 

The proof for the bound of 4/3 for full coverage does not extend to the partial 
coverage version. For the full coverage, to prove the lower bound on the optimal 
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solution Duh and Fiirer construct a graph G in which the vertices are the sets 
chosen by OPT and the edges are 1-sets and 2-sets of the approximate solution. 
They prove that G can not have more than one cycle and hence argue that the 
total number of 1-sets and 2-sets in the solution is a lower bound on OPT. This 
works well for the full coverage version but breaks down for the partial covering 
problem. For the partial covering case G having at most one cycle is a necessary 
but not a sufficient condition to prove the lower bound. 

In the full version of the problem, to bound the number of 1-sets in the solu- 
tion they construct a bipartite graph with the two sets of vertices corresponding 
to the sets chosen by the approximate solution and OPT. If a set corresponding 
the approximate solution intersects a set corresponding to OPT in m elements 
then there are m edges between their corresponding vertices in the graph. In each 
component of the graph they show that the number of 1-sets of the solution in 
that component is at most the number of 1-sets of OPT in that component. 
This is clearly not the case in the partial covering case. We obtain a bound on 
the number of 1-sets as a side effect of the proof for the lower bound on OPT. 

Theorem 3. The semi-local (2,1) -optimization algorithm for d-set partial cov- 
ering problem produces a solution that is within ^OPT -\- 1. 

4 Probabilistic Approaches for fc-Vertex Cover 

We now present a randomized rounding approach to the natural LP relaxation 
of fc-vertex cover. Analyzed in three different ways, this leads to three new ap- 
proximation results mentioned in ^ relating to vertex cover (full coverage) for 
expander graphs of constant average degree, /c-vertex cover on bounded-degree 
graphs, and multi-criteria fc-vertex cover problems. The fc-vertex cover problem 
on a graph G = (V, E) can be formulated as an integer program as follows. We 
assign binary variables Xj for each Vj € V and Zij for each (i,j) S E. Here, 
Xj = 1 iff vertex Vj belongs to the cover, and Zij = 1 iff edge (i,j) is covered. 
The LP relaxation is obtained by letting each Xj and Zij lie in [0, 1]: 



n 



min Xj subject to 








x^-\-Xj > Zij, {i,j)GE 


(1) 


X! - * 


(2) 











Our basic approximation recipe will be as follows. The LP relaxation is solved 
optimally. Let {a;*}, denote an optimal LP solution, and let A = 2(1 — e), 

where e G [0, 1/2] is a parameter that will be chosen based on the application. 
Let Si = {vj\x* > 1/A}, and S2 = V — Si. Include all the vertices in as 
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part of our cover, and mark the edges incident on vertices in Si as covered. Now 
independently for each j € S 2 , round Xj to 1 with a probability of Xx*, and 
to 0 with a probability of 1 — Xx*. Let W be the random variable denoting the 
number of covered edges at this point. If W < fc, we choose any k — W uncovered 
edges and cover them by arbitrarily choosing one end-point for each of them. 

We now introduce some notation to analyze the above process. Throughout, 
we let Pr[-] and E[-] denote probability and expectation, respectively. Let y* 
represent the optimal objective function value of the LP, and define Sq C S\ by 
= {uj : a;* = 1}. Let yp and yp be the contribution to y* of the vertices in Sq 
and P — S'o respectively. Denote by Uij the event that edge (i,j) is uncovered. 
Let Cl be the cost of the solution produced by our randomized scheme before 
the step of covering k — W edges if necessary, and let C2 be the cost incurred in 
covering these k — W edges, if any. The total cost C is of course Ci -I- C2; thus, 
E[C] = E[Ci] -I- E[C2 ]. Now, it is easy to check that E[Ci] < yp + Xy*p, and that 
E[C2] < E[max{fc — VP, 0}]. So we have 

E[C] <y*p + Xy*p + E[max{fc - W, 0}]. (3) 

As usual, let S denote the complement of an event £. Lemma Eon the statis- 
tics of W will be useful; we only give a proof sketch here. 

Lemma 1. (i) E[VP] > fc(l — e^). (ii) Suppose the graph G has maximum degree 
d. Then, the variance Par[W] ofW is at most c?E[kP]. 

Proof (i) Consider any edge (i,j). Now if x* > 1/A or x* > 1/A, Pr[Uij] = 0; 
otherwise, Pr [Uij] = (1 — Aa:*)(l — Aa;p. In the latter case, since a;* -I- a:* > z*j 
and e [0, 1], we can show that Pr[Uij] < (1 — Az*j/2)^ < 1 — z*j{l — e^j. 
Since E[W] = E(i,j)Gis S®* ^ 

(ii) We have W = X)(i j)G£ checked that if a random variable 

W' is the sum of pairwise independent random variables each of which lies in 
[0, 1], then Var[VP'] < E[kP']. However, the terms Uij that constitute W do have 
some dependent pairs: if edges (i,j) and {i' ,f) share an endpoint, then Uij and 
C/i' j' are dependent. Define 7 to be the sum, over all unordered pairs of distinct 
edges (i, j) and {i' ,j') that share an end-point, of Pr[C/ij Using the above 

observations and the definition of variance, we can show that Var[VP] < E[kP]-|-7. 
Now, for any term p = Pr[[7ij A Ui'ji] in 7, p < min{Pr[C/ij],Pr[t/i'_j/]} < 
(Pr[C/ij] -|-Pr[[/i'j/])/2. Finally, since each edge has at most 2(d— 1) other edges 
that share an end-point with it, we get 

Var[W] < E[W] -k 7 < E[W] -b ^ (2(d - l)/2) • Pr[t//“] = dE[W]. 

(iJ)eE 



4.1 Vertex Cover on Expanders 

Suppose we have a vertex cover problem; i.e., fc-vertex cover with k = m. The 
LP relaxation here has “1” in place of “zjj” in dU, and does not require the 
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variables Zij and the constraint 0. We focus here on the case of expander 
graphs of constant average degree. That is, for some constants c and d, we are 
studying graphs where: (i) the number of edges m is at most nd, and (ii) for any 
set X of vertices with |AT| < n/2, at least c\X\ vertices outside X have a neighbor 
in X. Since k = m, it is well-known that we can efficiently compute an optimal 
solution X* to the LP with all entries lying in {0, 1/2, 1}. Let H = {vj\x* = 1/2} 
and F = {vj\x* = 1}. Also, since W < k = m always holds, E[max{/c — 
W,0}] = Fi[k-W] < me'^, by Lemma 01). Thus, (0 shows that E[C] is at most 
yp + 2{l — e)y'^ + . (The overall approach of: (i) conducting a randomized 

rounding and then doing a greedy fixing of violated constraints, and (ii) using an 
equality such as our “E[max{fc — W, 0}] = E[fc — W]” here, is suggested in [Sn|. 
We next show how expansion is useful in bounding E[C] well. However, in the 
context of partial covering, an equality such as “E[max{A: — W, 0}] = E[fc — W]” 
does not hold; so, as discussed in 30 and 30 new analysis approaches are 
employed there.) Choosing e = yp/m, we get 

E[C] < yj^(2 — ?/|j/m) -I- y}-. (4) 

Case I: \H\ < n/2. By the LP constraints, the edges incident on vertices in H 
must have their other end-point in F. Since G is an expander, \F\ > c- \H\. Also, 
y*p = |E| and y*p = \H\ /2. So, since y* = yj^ + y*p, we have y*H = y*/(l + a) for 
some a > 2c. We can now use to get 

E[C] < 2y*p + y*p = (2- a/(l + a))y* < (2 - 2c/(l + 2c))y*. 

Case II: \Fl\ > n/2. So, we have y*p > n/4. Bound 0) shows that E[C] < 
(2 — y*p/m)y*] we have m < nd by assumption. So, E[C] < (2 — l/(4d))?/*. 

Thus we see that E[C] < [2 — min{2c/(l -I- 2c), l/(4d)}] • y*; i.e., we get a 
constant-factor approximation that is strictly better than 2. 



4.2 fc- Vertex Cover: Bounded-Degree Graphs 

We now show that any constant d, fc-vertex cover on graphs of maximum degree 
at most d can be approximated to within 2(1 — f2{l/d)), for any value of the 
parameter k. We also prove that the integrality gap in this case is at most 
2(1 — 12(l/d)). We start with a couple of useful tail bounds. First, suppose A is a 
sum of independent random variables Xi each of which lies in [0, 1]; let E[A] = y. 
Then for any 6 G [0, 1], the Chernoff bound shows that Pr[A > /i(l -|- d)] is at 
most . Next, suppose A is a random variable with mean y and variance 

cr^; suppose a > 0. The Chebyshev-Cantelli inequality (see, e.g., P), shows 
that Pr[A — y < —a] < /{cP' F a^). We now analyze the performance of our 

basic algorithm (of randomized rounding of the LP solution followed by a simple 
covering of a sufficient number of edges), for the A:- vertex cover problem on graphs 
with maximum degree bounded by some given constant d. The notation remains 
the same. The main problem in adopting the method of lit. I i here is as follows. 
Since k equaled m there, we could use the equality E[max{fc — IT, 0}] = E[/c — IT], 
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thus substantially simplifying the analysis. Such an equality is not true here; 
also, E[max{Al, 0}] > max{E[Al],0} for any random variable X. (The two sides 
of this inequality may differ a lot: if X is the sum of n i.i.d. random variables 
each of which is uniformly distributed on {—1, 1}, then the r.h.s. is zero, while 
the l.h.s. is 0{^/rl).) However, Lemma^ Chebyshev-Cantelli, and a case analysis 
of whether fc > 4d or not, can be used to show 

Pr[lT< (fc(l-e2) -2\/^)] < 1/5. (5) 

Next, for a suitably large constant cq, we can assume that k > cgd^. (Any 
optimal solution has size at most k, since in an optimal solution, every vertex 
should cover at least one new edge. So if k is bounded by a constant-such as cod^- 
then we can find an optimal solution in polynomial time by exhaustive search.) 
Also, by adding all the constraints of the LP and simplifying, we get that y* > 
k/d. Thus, letting 5 = l/(3ci), a Chernoff bound shows that immediately after 
the randomized rounding, the probability of having more than 2j/*(l — e)(l + i5) 
vertices in our initial cover is at most 1/5 (if the constant Cq is chosen large 
enough). Recall (EJ. So, with probability at least 1 — (1/5+ 1/5) = 3/5, the final 
cover we produce is of size at most 2y*(l — e)(l + 5) + ke^ + 2\/kd. We now 
choose e = y*(l + 5)/k] since y* > k/d > c^d'^ with cq sufficiently large, some 
simplification shows that the final cover size is at most 2y*{l — f2{l/d)). 

4.3 fc- Vertex Cover: Multiple Criteria 

We now briefly consider multi-criteria fc-vertex cover problems on arbitrary 
graphs. Here, we are given a graph G and, as usual, have to cover at least 
k edges. We are also given £ “weight functions” Wi, and want a cover that 
is “good” w.r.t. all of these. More precisely, suppose we are given vectors 
Wi G [0, 1]", i = 1,2,. and a fractional solution x* to the fc-cover problem 
on G. Let Wi = {wi^i,Wi^ 2 , ■ ■ ■ , Wi^n), and define y* = for 1 < i < £. 

We aim for an integral solution z such that for each i, yi = WijZj is not 
“much above” y* . Multi-criteria optimization has recently received much atten- 
tion, since participating individuals/organizations may have differing objective 
functions, and we may wish to (reasonably) simultaneously satisfy all of them if 
possible. The result we show here is that if y* > ci log^{£ + n) for all i (where c 
is a sufficiently large constant), then we can efficiently find an integral solution 
2 with yi < 2(1 + 1/ ^J\og{£ + n))y* for each i. 

5 Geometric Packing and Covering 

Recall this problem’s definition from m A polynomial-time approximation 
scheme exists for the case when k = n (full covering). The algorithm uses a 
strategy, called the shifting strategy. The strategy is based on a divide and con- 
quer approach. The area, I, enclosing the set of given points is divided into strips 
of width D. Let I be the shifting parameter. Groups of I consecutive strips, re- 
sulting in strips of width ID are considered. For any fixed subdivision of / into 



234 



R. Gandhi, S. Khuller, and A. Srinivasan 



strips of width D, there are I different ways of partitioning I into strips of width 
ID. The I partitions are denoted by Si, S 2 , ■ ■ ■ , Si. The solution to cover all the 
points is obtained by finding the solution to cover the points for each partition, 
; 1 < J < I, and then choosing a minimum cost solution. A solution for each 
partition is obtained by finding a solution to cover the points in each strip (of 
width ID) of that partition and then taking the union of all such solutions. To 
obtain a solution for each strip, the shifting strategy is re-applied to each strip. 
This results in the partition of each strip into “squares” of side length ID. As 
will be shown later, there exists an optimal covering for such squares. 

We modify the use of shifting strategy for the case when k < n (partial 
covering). The obstacle in directly using the shifting strategy for the partial 
covering case is that we do not know the number of points that an optimal 
solution covers in each strip of a partition. This is not a problem with the full 
covering case because we know that any optimal solution would have to cover all 
the points within each strip of a partition. For the partial covering, this problem 
is overcome by “guessing” the number of points covered by an optimal solution 
in each strip. This is done by finding a solution for every possible value for the 
number of points that can be covered in each strip and storing each solution. A 
formal presentation is given below. 

Let A be any algorithm that delivers a solution to cover the points in any 
strip of width ID. Let ^(Ai) be the algorithm that applies A to each strip of 
the partition Si and outputs the union of all disks in a feasible solution. We 
will find such a solution for each of the I partitions and output the minimum. 
Consider a partition Si containing p strips of width ID. Let rij be the number 
of points in strip j. Let be the number of points covered by OPT in strip 

j. Since we do not know , we will find feasible solutions to cover points for 

all possible values of ■ Note that 0 < < k'j = min(fc,nj). A dynamic 

programming formulation is as follows: 

C{x, y) = min (Df + C{x-l,y- i)) 



where C{x, y) denotes the number of disks needed to cover y points in strips l..x 
and Df is the number of disks needed to cover i points in strip x. Computing 
C{p,k) gives us the desired answer. 

For each strip s, for 0 < f < kg,Df can be calculated by recursive application 
of the algorithm to the strip s. We partition the strip into squares of side length 
ID. We can find optimal coverings of points in such a square by exhaustive search. 
With 0{P) disks of diameter D we can cover ID x ID square compactly, thus we 
never need to consider more disks for one square. Further, we can assume that 
any disk that covers at least two of the given points has two of these points on its 
border. Since there are only two ways to draw a circle of given diameter through 

fn'\ 

two given points, we only have to consider 2(2] possible disk positions where 



n' is the number of given points in the considered square. Thus, we have to check 
for at most ) arrangements of disks. 
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Let be the value of the solution delivered by algorithm A. The shift 
algorithm Sa is defined for a local algorithm A. Let rs denote the performance 
ratio of an algorithm B\ that is, is defined as the supremum of / \ OPT\ 
over all problem instances. We can show: 

Lemma 2 . rs^ ^ + 7 ) where A is the local algorithm and I is the shifting 

parameter. 

Theorem 4. The above algorithm yields a PTAS with performance ratio at most 

Proof. We use two nested applications of the shifting strategy to solve the prob- 
lem. The above lemma applied to the first application of the shifting strategy 
would relate the performance ratio of the final solution, to that of the so- 
lution for each strip, va- rs^ ^ ?'a(1 + 1/0- The lemma when applied to the 
second application of shifting strategy relates va to the performance ratio of the 
solution to each square, say rA'- Thus, rA < + 1/0- But since we obtain 

an optimal solution for each square, rA' = 1- Thus we have < (1 + 1/0^- 
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Abstract. A new framework for analyzing online bin packing algo- 
rithms is presented. This framework presents a unified way of explaining 
the performance of algorithms based on the Harmonic approach 
1811 (1111121. Within this framework, it is shown that a new algorithm, 
Harmonic-|— k, has asymptotic performance ratio at most 1.58889. It 
is also shown that the analysis of Harmonic-|-1 presented in is 
incorrect; this is a fundamental logical flaw, not an error in calculation 
or an omitted case. The asymptotic performance ratio of Harmonic-|-1 
is at least 1.59217. Thus Harmonic-1— I- provides the best upper bound 
for the online bin packing problem to date. 

Keywords: bin packing, online algorithms. 



1 Introduction 

Bin packing is one of the oldest and most well studied problems in computer 
science m The study of this problem dates back to the early 1970’s, when 
computer science was still in its formative phase — ideas which originated in the 
study of the bin packing problem have helped shape computer science as we 
know it today. The influence and importance of this problem are witnessed by 
the fact that it has spawned off whole areas of research, including the fields of 
online algorithms and approximation algorithms. 

Problem Definition: In the bin packing problem, we receive a sequence a of 
pieces pi,p2, ■ ■ . ,pn- We use the words piece and item synonymously. Each piece 
has a fixed size in (0, 1]. In a slight abuse of notation, we use pi to indicate both 
the ith piece and its size. The usage should be obvious from the context. We have 
an infinite number of bins each with capacity 1. Each piece must be assigned to 
a bin. Further, the sum of the sizes of the items assigned to any bin may not 
exceed its capacity. A bin is empty if no piece is assigned to it, otherwise it is 
used. The goal is to minimize the number of bins used. 

* This research was partially supported by an LSU Council on Research summer sti- 
pend and by the Research Competitiveness Subprogram of the Louisiana Board of 
Regents. 
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In the online version of this problem, each piece must be assigned in turn, 
without knowledge of the next pieces. Since it is impossible in general to pro- 
duce the best possible solution when computation occurs online, we consider 
approximation algorithms. Basically, we want to find an algorithm which uses a 
number of bins which is within a constant factor of the minimum possible num- 
ber, no matter what the input is. This constant factor is known as the asymptotic 
performance ratio. 

We define the asymptotic performance ratio more precisely. For a given input 
sequence u, let cost^ (cr) be the number of bins used by algorithm A on cr. Let 
cost (cr) be the minimum possible number of bins used to pack pieces in a. The 
asymptotic performance ratio for an algorithm A is defined to be 



R 



OO 

A 



lim sup sup 

n—^oo G 



COSt^(cr) 
cost (cr) 



COSt(cr) 




Let O be the set of all online bin packing algorithms. The optimal asymptotic 
performance ratio is defined to be = inf^go 

algorithm with asymptotic performance ratio close to 

Previous Results: The online bin packing problem was first investigated by 
Johnson 0. He showed that the Next Fit algorithm has performance ratio 2. 
Subsequently, it was shown by Johnson, Demers, Ullman, Garey and Graham 
that the First Fit algorithm has performance ratio ^ |Z]. Yao showed that 
Revised First Fit has performance ratio | , and further showed that no online 
algorithm has performance ratio less than | PI. Brown and Liang independently 
improved this lower bound to 1.53635 m The lower bound currently stands 
at 1.54014, due to van Vliet PI. Define Ui+i = Ui{ui — 1) -|- 1, ui = 2 and 



— 



^ u,-l 
1—1 



1.69103. 



Lee and Lee showed that the Harmonic algorithm, which uses bounded space, 
achieves a performance ratio arbitrarily close to hoo Q ■ They further showed that 
no bounded space online algorithm achieves a performance ratio less than hoo |H| • 
In addition, they developed the Refined Harmonic algorithm, which they 
showed to have a performance ratio of ||| < 1.63597. The next improvements 
were Modified Harmonic and Modified Harmonic 2. Ramanan, Brown, 
Lee and Lee showed that these algorithms have performance ratios of ||| < 
1.61562 and < 1.61217, respectively P). The best result to date is that 

of Richey mi- He presents an algorithm called Harmonic-|-1 and claims that it 
has performance ratio 1.58872. 

Our Results: In this paper, we present a general framework for analyzing a 
large class of online bin packing algorithms. This class includes Harmonic, 
Refined Harmonic, Modified Harmonic, Modified Harmonic 2 and 
Harmonic-I-1. In fact, we show that all these algorithm are just special cases 
of a general algorithm which we call Super Harmonic. We present a general 
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analysis of Super Harmonic. Our analysis is qualitatively different than pre- 
vious ones, in that we reduce the problem of analyzing an instance of Super 
Harmonic to that of solving a specific knapsack problem instance. We develop a 
branch an bound algorithm for solving such knapsack problems. Thus we provide 
a general computer assisted method of proving upper bounds for all algorithms 
that can be expressed in terms of Super Harmonic. This leads us to fundamen- 
tal logical flaw in the analysis of Harmonic-|-1. We show that the performance 
ratio of Harmonic-|-1 is at least 1.59217. In light of this finding, we develop 
a new algorithm called Harmonic-|— b, and show that it has asymptotic per- 
formance ratio at most 1.58889. Thus Harmonic-|— I- provides the best upper 
bound for the online bin packing problem to date. We also note that 1.58333 
is a lower bound for any Super Harmonic algorithm, thus Harmonic-|— I- has 
performance reasonably close to the best possible Super Harmonic algorithm. 

Due to space constraints, several proofs and a full description of the algorithm 
are omitted. They can be found in an appendix, available at: 

http : //www. CSC . Isu. edu/~seiden/append.ps . Z 



2 Interval Classification Algorithms 

An interval classification algorithm operates by classifying pieces according to 
a set of predefined intervals. Let fy = 1 > ^2 > • • • > fy > fy+i > 0 be real 
numbers. We define e = tn+i and t „+2 = 0. The interval fy is defined to be 
{tj+i,tj] for j = 1, . . . ,n+l. Note that these intervals are disjoint and that they 
cover (0, 1]. A piece of size s has type j if s € Ij. 

The Next Fit algorithm jS| is used to pack all items of size at most e. The 
algorithm maintains a single open bin. If the current item fits into the open 
bin, it is placed there. Otherwise, the open bin is elosed and a new open bin 
is allocated. Obviously, this algorithm is online, runs in linear time and uses 
constant space. The following well known lemma shall prove useful: 

Lemma 1. If the sum of the sizes of the items paeked by Next Fit is x, and 
eaeh item has size at most e, then the number of bins used is at most xj (1 — e)-|-l. 



Proof. Every bin packed by Next Fit, except possibly the one open bin, con- 
tains pieces whose total size is at least 1 — e. Therefore, the total number of bins 
used is at most \x/(l — e)] < a;/(l — e) -b 1. □ 

A packing is a tuple q = {qi, . . . , qn) over N such that 
tuitively, a packing describes the contents of one of the algorithm’s bins. I.e. 
qi is the number of items of type i contained in the bin. All interval classifica- 
tion algorithms operate by placing items according to some predetermined set 
of packings. We call the set of bins with a particular packing a group. 

An important subset of interval classification algorithms can be described 
in terms of one general algorithm, which we call Super Harmonic. All of the 
algorithms considered here fall into this sub-class. 
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An instance of the Super Harmonic algorithm is described by the following 
parameters: integers n and K; real numbers 1 > t2 > ■ ■ ■ > tn > tn+i = e > 0 , 
Q?!, . . . ,a„, G [0, 1] and 0 < Z\i < • • • < Afc < A and a function (j) : 

{0, . . . , K}. Define ti = 1 and Z\q = 0. In the following paragraphs, we describe 
the operation of Super Harmonic. 

Upon receipt, each item of type i < n is assigned a color, red or blue. The 
algorithm uses two sets of counters, ei, . . . , e„ and Si, . . . ,s„, all of which are 
initially zero. The total number of type i items is Si, while the number of type i 
red items is e^. For 1 < i < n, the invariant Cj = [oisj is maintained. 

Pi = [ 1 /Uj is the number of type i items which fit in a bin. Blue items of 
type i are placed Pi in a bin, as in Harmonic. 

= 1 — tiPi is the amount of space left when a bin is filled with Pi type i 
items. If possible, we would like to use this space to pack red items. We require 
that (j) satisfy < < 5 ^. Intuitively, V = describes the set of 

spaces into which red items can be placed. is the amount of space used 

to hold red items in a bin which holds blue items of type i. <f){i) = 0 indicates 
that no red items can be accepted. To ensure that every red item potentially can 
be placed, we require that ai = 0 for all i such that U > Ak- Define 7 i = 0 if 
ti > Ak and 7j = max{l, [Ai/Uj} otherwise. This is the number of red items of 
type i that the algorithm places together in a bin. Note that this is the maximum 
number guaranteed to fit in every space in T>. Define 

if{i) = min{j | p < Aj, 1 < j < K}. 

Intuitively, ip(i) is the index of the smallest space in T> into which a red item of 
type i can be placed. 

The bin groups used are named: 

{i \ 4>i = 0, I < i < n,}, 

{(i,?) I </)* yf 0, 1 < z < n,}, 

W,j) I 0, 1 < z < n,}, 

{(bj) I <5^'* y^ 0, Oij yf 0, jjtj < A^(i), I <i <n, I < j <n}. 



We call these groups monochromatic, indeterminate blue, indeterminate red and 
bichromatic, respectively. Collectively, we call the monochromatic and dichro- 
matic groups final groups. 

The monochromatic group z contains bins which hold only blue items of 
type z. There is one open bin in each of these groups; this bin has fewer than Pi 
items. The closed bins all contain Pi items. 

The bichromatic group (z, j) contains bins which contain blue items of type z 
along with red items of type j. A closed bin in this group contains Pi type z 
items and 7^ type j items. There are at most three open bins. 

The indeterminate blue group (z, ?) contains bins which hold only blue items 
of type z. These bins are all open, but only one has fewer than Pi items. 

The indeterminate red group (?,j) contains bins which hold only red items 
of type j. Again, these bins are all open, but only one has fewer than 7^ items. 
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Essentially, the algorithm tries to minimize the number of indeterminate bins, 
while maintaining all the aforementioned invariants. I.e. we try to place red and 
blue items together whenever possible; when this is not possible we place them 
in indeterminate bins in hope that they can later be so combined. A formal 
description of Super Harmonic is displayed in Figure Q The symbols <C>, Jit, *, 
Z>, Jli and t are labels used in the proof of Lemma |21 



Initialize Ci 0 and Si 0 for 1 < i < n. 

For each piece p: 
i •<— type of p. 

If i = n + 1 place p using Next Fit. 

Else: 

Si — Si + 1. 

If €i < [oiSiJ: 

6i 6i + 1. 

Color p red. 

<C> If, for any j, there is an open bin in group (j,i) or (?,i) with fewer 
than 7 i type i items, then place p in this bin. 
ijt Else if there is some bin in group (j, ?) such that > 'yiti, then 

place p in it and change the group of this bin to {j, i). 

* Otherwise, open a new group (?, i) bin and place p in it. 

Else: 

Color p blue. 

If cfn = 0: 

If there is an open bin in group i with fewer than Pi items, then 
place p in this bin. 

If not, open a new group i bin and place p there. 

Else: 

‘s? If, for any j, there is an open bin in group {i,j) with fewer than Pi 
type i items, then place p in this bin. 

Else if there is an open bin in group (i, ?) with fewer than Pi type i 
items, then place p in this bin. 

4t Else if there is some bin in group (?, j) such that A^^i) > 'yjtj then 
place p in it and change the group of this bin to 
I Otherwise, open a new group (i, ?) bin and place p there. 



Fig. 1. The Super Harmonic Algorithm. 



Lemma 2. Super Harmonic maintains the following invariants: ( 1 ) The 
number of red items of type i is [aiSi\. ( 2 ) At most one bin is open in any 
group i. ( 3 ) At most three bins are open in any group (i,j). (4) At most one bin 
has fewer than Pi items in any group (i, ?). ( 5 ) At most one bin has fewer than 
7 i items in any group (?,i). 

Due to space considerations, the proof is given in the appendix. 

Corollary 3.1 of Ramanan et al. ma implies the following result: 
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Lemma 3 (Ramanan et ah). For all choices of parameters, the asymptotic 
performance ratio 0/ Super Harmonic is at least ^ > 1.58333. 



3 Weighting Systems 



Analysis based on weighting functions is introduced in [Zj, and is used in the 
subsequent work on interval classification bin packing algorithms IHIlOlllj . We 
generalize the idea of a weighting function here. 

Let R and N be the sets of real numbers and non-negative integers, respec- 
tively. 

A weighting system for algorithm A is a tuple (R"^, w^, ^^ 4 ). R™ is a vector 
space over the real numbers with dimension m. The function w _4 : (0,1] 1 — >■ R™ 
is called the weighting function. For each j < n, w^(a::) is constant for x G Ij. 
The function : R"* 1 — >■ R is called the consolidation function. We have 

Ca(x) = ^^(x), if X e Dj 

for some set linear functions and some set Di,. . . , of disjoint 

domains covering R™. We require that ^,4 is continuous and has the scalability 
property: ^^(ax) = a^_ 4 (x) for all x G R"* and a G R. Since ^a is continuous, 
and each piece is linear, the boundaries defining each domain are defined by at 
most A — 1 linear functions. I.e. each domain can be described using at most 
Xi < A constraints: 

X • > 0 

X ■ di 2 > 0 
xG Di ^ ... 

X ■ di,Ai > 0. 

Finally, for (R™, to be a weighting system we must have 



cosU (cr) < ^A 



N 






+ 0 ( 1 ), 



for all input sequences a. Intuitively, in the simplest case, the weight of a piece 
indicates the maximum portion of a bin that it can occupy. 

Weighting systems can be used to analyze Harmonic, Refined Har- 
monic, Modified Harmonic, Modified Harmonic 2, Harmonic+1 and 
HARMONIC++. In fact, all these algorithms are instances of Super Harmonic. 
We develop a general analysis of Super Harmonic and apply this analysis to 
prove upper bounds for these specific algorithms. 

We define a 2K + 1 dimensional weighting system for Super Harmonic. In 
order to express the vectors in compact format, we define the unit basis vectors: 
bo, bi, . . . ,hfc, ri, . . . ,rj^. The weighting function is 



wsH(a;) 



(1 - at) 

bo 



A 



+ 0>i 






ii X G li with i < n, 
if X G In+l. 



1-e 
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The consolidation function is 

£sH(x)=x-bn+ max min 
l<k<K+l 






< i—k 



K K k-1 

rj + X ■ b,, X ■ rj + X 
i—1 i—1 2=1 




Lemma 4. For all a, costsH(CT) < ^sh wsnfe)) + 0(1). 

Due to space considerations, the proof is given in the appendix. 



4 General Analysis with Weighting Systems 

We now turn our attention to analyzing algorithms. We begin by develop- 
ing general techniques applicable to the analysis of any interval classification 
algorithm — we show how weighting systems can be used to upper bound the 
asymptotic performance ratio of a given algorithm. We then focus our analysis 
on Super Harmonic. 

Suppose we have an online interval classification algorithm A, with weighting 
system (K™, Fix an input cr. 

Consider the optimal offline solution for cr. Suppose some bin in the optimal 
solution is not full. Let x be the sum sizes of the pieces in this bin. Then add a 
piece of size 1 — a; to the end of our sequence. The cost of the optimal solution 
does not increase, whereas the cost to A cannot decrease. Therefore, when upper 
bounding the performance ratio of A, we may assume that each bin in the optimal 
solution is full. 

A pattern is a tuple q = {qi, . . . , qAj over N such that ft < 1- Intu- 
itively, a pattern describes the contents of a bin in the optimal offline solution. 
The reader should contrast this with the definition of a packing given earlier. 
The weight of pattern q is 

( n \ n 

1 j + y^ft w^(ti). 

i=l / i=l 

Define Q to be the set of all patterns q. Note that Q is necessarily finite. 

A distribution is a function y : Q i— >■ M>o such that = 1- Given 

cr, A is defined by the numbers and types of items it places in each of the bins 
it uses. Specifically, A is defined by a distribution y. It uses cost(cr)y(q) bins 
containing items as described by the pattern q. 

To show that A has performance ratio at most c, we show that 

\96C 

is at most c, for all cr. The second step follows from the scalability of 
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We are therefore led to consider the following optimization problem: Maxi- 
mize C^(x) subject to 



J2xiq)^A{q); 


(1) 


qeQ 




x-dij, for 1 < j < Ai; 


(2) 


X{q), ior qGQ; 


(3) 




(4) 


qea 




{I...A}; 


(5) 



over integer variable i, real variables x(9)>9 ^ Q real vector x. The value 
of this mathematical program, which we name V, upper bounds the asymptotic 
performance ratio of A. Fix i = j and call the resulting linear program Vj. We 
can solve V by solving Pj for i = 1, A and taking the maximum value. The 
following lemma tells us something about the structure of a solution to Pj : 

Lemma 5. For 1 < j < A, there exists an optimal feasible solution to Vj where 
x{q) is non-zero at at most -|- 1 patterns q. 

Proof. Vj is a |Q| — 1 dimensional linear program, since x and 0) may be 
removed by substitution. By the theory of linear programming, the optimal 
solution is achieved at some vertex of the polytope of feasible solutions defined 
by 0 and 0. At a vertex of this polytope, |Q| — 1 inequalities are satisfied 
with equality. Of these, at least |Q| — 1 — Ai must be of the form (0, and each 
of these implies that some value of x is zero. The total number of variables is 
|Q| and |Q|-(|Q|-l-A) = Ai + l. □ 

For certain types of consolidation functions, stronger results are possible: 

Lemma 6. If C^t(x) = maxi<^<yi some set • ■ ■ > of linear func- 

tions then there exits an optimal feasible solution to V where x(q) is non-zero 
at at most one pattern q. 

Proof. Suppose distribution x defines an optimal feasible solution. The objective 
value achieved is 



max 



xiQ)^A{q) 



qeQ 

< max^^(w^(q)) = ^^(w_4(g*)). 



for some 1 < ^ < A and q* G Q. The first step uses the linearity of ■ ■ ■ j ^a^ 
while the second uses the fact that x is a convex combination over Q. So the 
distribution 

^ ^0 otherwise. 
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achieves at least the objective value achieved by y. We need to show that % is 
feasible for i = i. Let x' = If x' is feasible for i = £ 

we have ^ some j yf £, this would 

contradict the optimality of x- 

Note that the preceding lemma is applicable in the analysis of Harmonic, 
Refined Harmonic and Modified Harmonic, but not Harmonic+1. jH] 
and m use it implicitly. As we shall see, Richey HH uses it incorrectly. Using 
Lemma 0 our problem is reduced to that of finding the single pattern q which 
maximizes Ca(w_ 4 ( 9 ))- This leads us to consider the mathematical program: 
Maximize subject to 



n 



X = W^(?/) -t- ^ qjWAitj); 


(6) 


n 

2/ = 1 - E 


(7) 


i=i 




0 < X • dij, for 1 < j < \i', 


(8) 


2/ > 0, 


(9) 


Qj G N, for 1 < j <n, 


(10) 


iG {1... A}; 


(11) 



over variables x,y,i,qi , . . . Intuitively, g is a pattern; qj is the number of 
type j pieces in g. i/ is an upper bound on space space available for type n + 1 
pieces. Note that strict inequality is required in Q because a type j piece is 
strictly larger than tj+i. Call this integer linear program V. The value of V up- 
per bounds the asymptotic performance ratio when the consolidation function 
satisfies the conditions of Lemma El One can think of P as a knapsack prob- 
lem: The adversary must pack items into a knapsack of size 1. The profit for 
a knapsack is found by applying the consolidation function to the sum of the 
weights. 

The following lemma allows for further simplification in the case that the 
algorithm under consideration uses Harmonic to pack items below a certain 
size: 



Lemma 7. Let I and k < £ be positive integers, and y < 1/k be a positive real 
number. The mathematical program: Maximize 



e-k 



E 



Vi 

k + i — 1 



+ 



£ 

£-1 



(y-z) 



subject to 



z <y; 



e-k 







qi € N, for 1 < i < £ — k; 

over variables v\,. . . ,vi-k and z has value T{y, k,£) where 
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r{x,ij) 



- — -X ifi = j 

J - 1 

] + r{x- + ifx> ^ 
^r{x,i + l,j) otherwise. 



Due to space considerations, the proof is given in the appendix. 

Note that the lemma implies that Harmonic has performance ratio 
r{l, 1, n + 1). Further, it is easily seen that lim„_>oo d^(l, 1, n + 1) = hoo- 

Using the machinery we have developed here, it is also easy to analyze the 
performance ratios of Refined Harmonic and Modified Harmonic. One 
merely need evaluate V; this is easily accomplished even by hand calculation. 

We now turn to Super Harmonic. We abuse notation and define = 
wsh(U) for the remainder of our discussion. Fixafcs {l,...,iF+l} and define 

K K K k-l 

s = bo + t = bo + + ^bi. 

i—k i—1 i—1 i—1 

First note that min{x • s, x • t} = x • t for fc = 1 and min{x • s,x • t} = x • s 
for k = K + 1. In these two cases, we can apply Lemma 0 and the performance 
ratio is upper bounded by the value of V. 

We now turn to 2 < k < K. Consider the mathematical program: Maximize 

min{(z X + (1 — z)x') • s, (z x + (1 — z)x') • t} 



subject to 

ZG [0,11; 

1 ” 

x'= — y'bo + ^g'w,; 

n 

y' = 1 - XI 

i=i 

y'>0; 

q'j G N, for 1 < j < n; 

over variables z, x, x', y,y' ,qi, . . . , qn, q{, . . . ,q'„. Call this program Pk- By 
Lemma O if we show that value of Vk is at most c for all k, the asymptotic 
performance ratio of Super Harmonic is at most c. The two patterns are q 
and q' , their weights are x and x', respectively. The constraints guarantee that 
q and q' are legitimate patterns. Conversely, the reader should verify that the 
constraints allow all possible patterns. The distribution between q and q' is given 
by z. Again, Vk is a type of knapsack problem: The adversary must pack items 
into two knapsacks of size 1. The profit is found by applying the consolidation 
function a convex combination of the weights in the two knapsacks. 






1 

— ybo + X 



1-e 






f=i 



y = 1 - 

t=l 

y > 0; 
qj G N, 
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We have developed a branch and bound algorithm for solving Vk- We im- 
plemented this algorithm in C-f-l-. To ensure the validity of our results, all cal- 
culations are done using the GNU CLN infinite precision rational arithmetic 
package. A front end program is written in Mathematica. An explanation of the 
algorithm along with a listing of the code appears in the appendix. The program 
is available on the World Wide Web at 

http : //www . CSC . Isu . edu/~seiden/ super _harmonic . tgz 



5 Results 

Details of Harmonic-1— I- are given in the appendix. Using the methods outlined 
in the preceding sections, we are able to show our main results: 

Theorem 1. The asymptotic performance ratio of Harmonic-|— I- is at most 
158889/100000. 

Theorem 2. The asymptotic performance ratio of Harmonic-|-1 is at least 
1.59217. 

Due to space considerations, the proofs is given in the appendix. 

Using our computer program, we have also verified the upper bounds for 
Harmonic, Refined Harmonic, Modified Harmonic and Modified Har- 
monic 2. 

6 Conclusions 

We have developed a uniform method of analysis for online bin packing al- 
gorithms. Using our framework, we have analyzed the Super Harmonic al- 
gorithm. Online bin packing algorithms based on Harmonic are just special 
instances of this general algorithm. We have developed an instance of Super 
Harmonic, called Harmonic-|--|-, which has the best performance of any on- 
line bin packing algorithm to date. Our framework is easily adapted to closely 
related problems, such as variable-sized bin packing H2| and resource augmented 
bin packing |^. 

The question of how to design a Super Harmonic algorithm is still an open 
one. The problem of choosing interval breakpoints is at the heart of this question. 
Once this choice is made, the values can be optimized by mathematical (in 
some cases linear) programming. The solution to the breakpoint choice problem 
seems to be currently out of our reach. The results of Ramanan et al. |1 Oj imply 
that any instance of Super Harmonic has performance ratio at least 19/12 > 
1.58333. Further, the set of intervals designed by Richey works very well, despite 
the ad-hoc design. We did not find another set which would significantly improve 
performance. An understanding of the breakpoint problem will not lead to a large 
improvement in performance, or bring us close to the lower bound of 1.54014. 
Still, despite the inherent limitations of our approach, it is our hope that this 
work brings us one step closer to a final resolution of the online bin packing 
problem. 
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Abstract. Solving an open problem of Jain and Vazirani [FOCS’99], 
we present 0{n + m) time constant factor approximation algorithms for 
the fc-median, fc-center, and facility location problems with assignment 
costs being shortest path distances in a weighted undirected graph with 
n nodes and m edges. 

For all of these location problems, 0{n^) algorithms were already known, 
but here we are addressing large sparse graphs. An application could be 
placement of content distributing servers on the Internet. The Internet is 
large and changes so frequently that an O(n^) time solution would likely 
be outdated long before completion. 



1 Introduction 

We consider several classical (discrete) location problems defined in terms of a 
a metric (P, dist), |P| = n. We want to pick a set S' C P of facilities subject to 
different objectives. In the k-median and k-center problems we require |S| = k, 
and our goal is to minimize dist(x, S) and maxa,g p dist(x, S), respectively. 

In the facility location problem, there is no limit k on the number of facilities, 
but we are further given a facility cost function f-cost : P — ?► No, and our goal is 
to minimize ‘^)- 

In this paper, we are interested in the graph setting where the metric is the 
shortest path metric of a weighted undirected connected graph G = {V,E,i : 
E — >• N), \V\ = n,\E\ = m > n — 1, that is, dist(x, y) is the length of the shortest 
path from a: to y in G. This setting used is the basis for many of the classical 
applications of facility location in operations research m- For example, the 
problems can then model placement of shopping centers on a road network with 
driving distance to nearest shopping center the consumer cost. Also, they can 
model placement of content distribution servers on the Internet. Both examples 
concern large sparse graphs. Further, the Internet changes frequently and hence 
an algorithm has to be fast in order to produce up-to-date answers. In particular, 
we cannot wait for months for an all-pairs shortest paths computation to finish. 

For all of the above location problems, we present 0(m) time constant factor 
approximation algorithms. Here ~ means that we ignore logn factors. The con- 
crete approximation factors are 12 -|- o(l) for A:-median, 2 for fc-center, 3 -I- o(l) 
for facility location. The approximation factor for the fc-median problem may 
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be reduced to around 9, but this is complicated and beyond the scope of the 
current paper. 

Our results solve an open problem of Jain and Vazirani H3|. They consid- 
ered a distance oracle setting where given any pair of points (x,y) G P^, one 
can compute dist(x,?/) in constant time. For the A:-median and facility location 
problems, the achieved approximation factors of 6 and 3, respectively, in O(n^) 
tim^ improving for the fc-median, the LP-based factor 6| from PJ. They noted 
that “The distinguishing feature of our algorithms is their low running time” . In 
their final discussion they ask if improved running times can be obtained for the 
graph version in the case of sparse graphs, as done near-optimally in this paper. 
We note that the distance oracle setting is interesting in its own right, e.g. in- 
terpreting the location problems as clustering problems for large data bases |2| . 
As it turns out, a by-product of our work is some much improved approximation 
factors in “sub- linear” o(n^) time in the distance oracle setting. 

It should be appreciated that the graph and distance oracle models are fun- 
damentally different in that a single distance query in a graph takes 0(m) time 
H3- For a sparse graph, Jain and Vazirani would first compute the shortest path 
metric with an all pairs shortest path computation, which takes 0{nm) time. 
Our improvement to 0{m) time is obtained using only a polylogarithmic num- 
ber of single source shortest path computations. We note that it is easy to get 
an O(n^) time solution using approximate shortest path computations (see e.g. 
0), but the approximation factor become worse than ours. More importantly, 
as suggested in m and as in our example applications, we are mostly interested 
in large sparse graphs with m = 0{n), and then our 0(m) time is a strong 
improvement. 

The pride of this paper is the solution to the /c-median problem, the hardness 
being the sharp bound on the number of facilities. Facility location is compara- 
tively trivial because we can use approximate facility costs. It is considered here 
for completeness because it was part Jain and Vazirani’s open problem [ I .'Ij . The 
/c-center solution comes in for free as a warm-up for our solution to the facility 
location problem. Also, by covering A:-median, A:-center, and facility location, we 
develop a quite general tool-box for basic location problems in networks PI- 

Henceforth, the paper is focused on the fc-median problem, leaving fc-center 
and facility location to two independent sections at the end (c.f. Inland 

The distance oracle version. Our work on the graph version of the fc-median 
problem is inspired by the progress on the distance oracle version: Indyk m 
has presented a randomized reduction that together with the 0{fn) time factor 
6 approximation algorithm of J ain and Vazirani m implies a 0{k^n) time factor 
(3-|-o(l))(2-|-6) = 24-|-o(l) approximation algorithm, though cheating a bit us- 
ing 2fc facilities. Also, as part of their work on an on-line version of the fc-median 
problem, Mettu and Plaxton presented an O(n^) time algorithm with an ap- 
proximation factor slightly below 40. Finally, based on Indyk’s construction m, 

^ Actually, they get 0{fn) time if only / points are potential facilities, but here we 
generally assume all points are potential facilities. 
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Guha et al. jjj presented a 0{kn) time factor 6 x 2 x (24 + o(l) + 1) = 300 + o(l) 
approximation algorithm. Their algorithm works for a special streaming version 
of the problem if distances between points can be computed directly from the 
points. Their approximation factor can be reduced to 80 + o(l) if k = 0{y/n) 
[Guha, personal communication]. An 0{kn) time constant factor approximation 
is also announced by Mettu and Plaxton m, but with no specification of the 
constant. They also pointed out that f2(kn) time is necessary even for random- 
ized algorithms. 

Some of our initial developments actually imply an 0{kn) time factor 12-|-o(l) 
approximation for the distance oracle version of the fc-median problem. This is 
better than any previous o(n^) time algorithm, and 25 times better than the 
previous 0{kn) time algorithm for unbounded k. Glearly some of the pre- 
vious approximation factors could have been improved somewhat, but our vast 
improvement is due to some new simple and powerful sampling techniques. Gor- 
responding improvments can also be obtained in the streaming model considered 
in [Z| (historically this paper actually predates [Z] slightly, but had a less fortu- 
nate conference history). 



The graph version. The previous work on the distance oracle version does have 
some applications for the graph version. Applying an all pairs small-stretch path 
algorithm of Gohen and Zwick |0, we can approximate all distances in the graph 
within a factor 3 in O(n^) time, and then we can apply Jain and Vazirani’s 
algorithm in 0{n^) time to get a factor 3 x 6 = 18 approximation algorithm. 
In order to exploit the 0{kn) time distance oracle algorithm of Guha et al. 0, we 
can first apply Thorup and Zwick’s m approximate distance oracle algorithm: 
for any positive integer t, after 0(tmn^^*) preprocessing time, we can answer 
distance queries within a factor 2t — 1 in 0{t) time. Setting t = 2 and combining 
with 0, we get an 0{my/n + nk) time algorithm with approximation factor 
3(300 -k o(l)) = 900 -k o(l), or 3(80 -k o(l)) = 240 -k o(l) if k = y/ri. 

Our new 0(m) time bound is near-optimal and breaks the f2(kn) lower 
bound for distance oracles if k ^ m/n. Further, our approximation factor is 
only 12-ko(l). This is better than any previous o{nm) time solution. It appears 
the approximation factor can be further reduced to around 9. 

Our approach is easily generalized to work for weighted points. One ap- 
plication of this is if only a subset of the points are potential facilities. Then 
we first assign each point to its nearest potential facility. Second we solve the 
fc-median problem over the potential facilities, each weighted by the number 
of assigned original points. The resulting solution can be seen to be a factor 
2(12 -k o(l)) -k 1 = 25 -k o(l) approximation. 



Other metrics. Our techniques imply that the fc-median problem can be solved 
with 0(n) nearest neighbor queries. For Hamming space this implies a constant 
factor approximation in 0(n^+®) time for e > 0 thus beating that 0{kn) 
lower bound for general distance oracles if fc = We note that large values 

of fc are relevant to fine grained clustering. 
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(Im) practicality and outline. Our 0(m) solution to the fc-median problem in 
graphs is exceedingly complicated, and hence unlikely to be of practical rel- 
evance. However, in our developments we present several algorithms that are 
simple and easy to implement, yet providing stronger bounds than were previ- 
ously known. Also, our algorithms for the fc-center and facility location problems 
are simple and easy to implement, and it is all these simpler algorithms that con- 
stitute the practical contribution of the paper. 

First, in 0 we present a simple fast randomized algorithm for selecting a 
set of 0{k) potential facilities, guaranteed to contain a solution with k facilities 
with at most twice the cost of the optimal solution. Using this as a preprocessing 
step for Jain and Vazirani’s algorithm, we find a factor 12 -|- o(l) approximation 
in 0{kn) time for distance oracles, and 0{km) time for graphs. This part is 
considered very practical, “perfect” for distance oracles, and good for graphs if 
k is not too large. 

Next, in m we show that it can be meaningful to apply the algorithm from 
m to a sparse graph whose weights do not satisfy the triangle inequality, and 
construct a graph for which this makes sense. In 0{m) time, this leads to a factor 
12-|-o(l) approximation to the A:-median problem but cheating using fc-|-fc/ log^ n 
facilities. This part is still simple enough to be of practical use, and good enough 
if the bound on the number of facilities is not sharp. 

The true difficulty in the /c-median problem is the sharp bound on the fa- 
cilities. In ^ we sketch a very convoluted recursion for getting rid of the last 
fc/log^n extra facilities, leaving the details to the journal version. It is based 
on structural theorems showing that if we cannot easily get rid of the fc/log^ n 
extra facilities, it is because our current solution is very similar to any optimal 
solution. This similarity allows us to fix most of the facilities and recurse on a 
o(fc)-median problem. This last step is considered too complicated to be of prac- 
tical interest, but it takes us to our theoretical goal: a near-linear time constant 
factor approximation to the /c-median problem. 

Notation and terminology. We are dealing with a metric (P, dist), |P| = n, from 
which we pick a subset S of facilities. Further, each point x £ P is assigned 
a facility a: € S' at cost dist(a::, The total cost of S is then cost(S) = 
dist(a, a;'^) Unless otherwise stated, we assume that is a point in S 
nearest to x. Then the assignment cost for x is dist(a:, S) = min^gs dist(a, a), 
and then the total cost of S is cost(S) = Yl,x^p dist(a;, S). 

By evaluating a set S Q we mean that for each x £ P, we find its nearest 
point x^ in S, compute dist(a,a‘®) = dist(x, S), and finally, compute cost(S). 
Observation 1. In an undirected (or directed) weighted graph, we can evaluate 
a set S in 0{m) time fO(mloglogn) time). 

Proof. Introduce a source s with zero length edges to all a £ S, and compute 
the shortest path tree to all nodes in 0{m) time [EZ]. For each x, dist(x, 5) is 
the found distance from s, and cost(S') is the sum of these distances. Finally, for 
each X in the subtree of a £ S, we set x^ = a. The same construction works for 
directed graphs if we first reverse the direction of all edges and use the directed 
shortest path algorithm from m- 
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Note that if for some reason, we want to run on a restricted machine model such 
as the pointer machine, we get an 0(n log n + m) time bound in Observation ^ 
from 

The k-median problem is the problem of finding S Q P, [S'! < k, mini- 
mizing cost(S'). Define fc-mediancost-^ = min{cost(S') : ^ C F, [S'] = k} and 
/c-mediancost = A:-mediancost-^. By a c-approximation to the fc-median prob- 
lem, we mean a solution S Q P, [Fj = k, with cost(S') < c x fc-mediancost. 

By the S'-cluster of a G S, we mean the set {x G P : = a}, denoted 

cluster (a, S). 

Generally, we will use a subscript to denote that measurements are done in a 
metric different from the one currently understood. For example, if iL is a graph, 
distij(a;, j/) is the distance from a; to y in H. 

2 Sampling k n Facilities 

In this section, we will prove 

Theorem 1. For 0 < e < 0.4, with probability at least 1/2, by sampling and 
evaluating O(logn) sets of size Oikje logn), we can identify a set F of size 
0{kfe log^ n) such that fc-mediancost-^ < (2 -I- e) x /c-mediancost . 

Our constructive proof of Theorem n]is based on the following simple algorithm: 



Algorithm A 
A.I.R := P- S := 0; 

A.2.while \R\ > k/e log^n do 

A. 2.1. add ik/e logn random points from R to S 

A. 2. 2. pick a random t G {l,...,|i?|} and remove from R the t points with lowest 
distance to S. 

A.Sreturn F = S U R. 

Proof (of Theorem^. First we note that the probability that Algorithm 1X1 
terminates in w(logn) iterations is (each round reduces i? by a factor 2 

with probability 1 /2 and we can only do this log 2 n times) so we may assume 
termination in O(logn) iterations. 

Let OPT be an optimal solution to the /c-median problem. Our claimed good 
solution inside F will be OPT^ = {a^}a(^oPT- It is trivially of size k, and for 
our analysis, we will assign each x G P to . 

A point X G P will be declared “happy” as soon as a point a with 
dist(a, < dist(a;, is picked for S, to later end up in F. Clearly, 

if all points ended up happy, we would get cost(OPT^) < 2cost(OPT). Unfor- 
tunately, we cannot hope to make all points happy, but we will find a way to 
pay for the unhappy ones. 

Consider an unhappy point x. Our assignment cost for x is 
dist(a;, < dist(a;, -I- dist(a:‘^'^'^, < dist(a;, -I- 

dist(a;'^^'^, a;^) < dist(a;, -I- dist(x‘^'^'^, x) -1- dist(x,x^) < 
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2dist(x, OPT) + dist(a;,P) The point x can itself pay 2dist(x, OPT), so 
it only remains to show that the sum over dist(a:,P) over all unhappy x is 
bounded by £ cost (OPT). 

Consider an iteration of the algorithm. Let S and R be the values of S and 
R after step lirm and let U be the prefix of R removed in step we wiii 

show that the expected fraction of unhappy points in R and U is very small and 
based on this we will show that the happy points can pay for them. 

Claim 1. The expected fraction of unhappy points in R after step im is < 
£/(41ogn). 

Proof. Consider a point x € R which was not happy before step IA.2.1L and 
let C be the remaining part of the OPT-cluster containing x, that is, C = 
cluster(cc‘^^^, OPT) fl R. 

Suppose there are i points in O, including x, that are as close to x^^'^ 

as x. Now, the probability that x is not turned happy by step IA.2.11 is 

(1 - i/|P|)'‘'=/® iogn/|fi|. Thus, no matter the size of C, the ex- 
pected number of unhappy points in C is at most ^ < 

f^o < |i?|/(4A:/£ logn), so with k clusters, the expected frac- 

tion of unhappy elements in R is at most l/(4/£ logn). 

Claim 2. If f is the fraction of unhappy points before step removes the 

random prefix U of R, the probability that 

dist(a;,S') < £ dist(a;, S')/2 (1) 

unhappy x^U happy xGU 



is not satisfied is at most f{2 -|- 2/e). 

Proof. Let the points in R be sorted in order of increasing distance to S. We 
are going to delete the first t points in this sequence with t randomly chosen 
from {1, ..., |i?|}. Before this happens, traverse R and let each unhappy point 
grab the first |"2/£] happy points of higher distance that are not grabbed yet, if 
any. Now, if point t is neither unhappy nor grabbed, then each unhappy point 
among the first t points have grabbed its own |"2/£] happy points to pay for it, 
and hence du is satisfied. The fraction of points that are unhappy or grabbed 
by an unhappy point is /([2/£] -|- 1). 

By Claim ^ and 0 the probability that (0) is not satisfied for a given iteration is 
£'(/)(2-|-2/£) < (l-|-£)/(21ogn). Since the expected number of iterations is 
the probability that du is false for any iteration is < (1 -I- e)Hn/ {2logn) <1/2 
for £ < 0.4 and n — > oo. Thus, we may assume that m is satisfied over all 
iterations. 

Since dist(x. S') > dist(a;,T) for any unhappy x, and dist(?/, S) < 

2dist(y, OPT) for any happy y, (0 implies E^nhappy ^ 

^ Ehappy xeu dist(a;, OPT) /2 However, since the sets U from different iterations 
are disjoint, we get Eunhappy xepdist(x,T) < e Ehappy xeP dist(x, OTT)/2 < 
£COst(OTT), as desired. 
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It should be noted that the above construction works even if OPT is allowed 
to use points not in P, and in this case, we cannot get below 2 (think of OPT 
being the center of a star and the center not being in P). Also, note that for 
each a S OPT, all points in cluster(a, OPT) are assigned to the same facility 
in F. Hence the construction works even if the facilities have the same limited 
capacity. 

Corollary 1. For a general metric we can find a {12 + o{l))~ approximation for 
the k-median problem, using 0{kn) distance queries and computation time. 

Proof. By the above theorem and lemma, we spend 0{kn) distance queries on 
finding a set of 0{k) relevant facility locations, and then the algorithm from m 
allows us to solve the /c-median problem in 0{kn) time. 

3 Reducing to k k/ log^ n Facilities 

In this section, we will show that we can get down from k log*^*-^^ n potential facil- 
ities to a solution S with k + k/log^ n facilities. We wish to apply the techniques 
of Jain and Vazirani H3|, but these techniques are only quoted as working for 
graphs satisfying the triangle inequality. More precisely, let F be the set of poten- 
tial facilities. They assume all edges in F x P, and that each edge (a,x) G F x P 
has length £{a,x) = dist(a, x). If |F| = log“^^^ n, this is too much for our time 
bounds. 

We will now discuss what happens when the algorithm from UBI is applied 
and G does not satisfy the triangle inequality. Define £-cost(5') = J2xeP 
Here i{x,x^) = oo if (x,x^) is not an edge in G. What the algorithm from m 
really finds is a fc-median S C F with cost(5') < 6x fc-median-Acost-^. The point 
is that the dual variables providing the lower bound in m do not require triangle 
inequality. It is only used in bounding the gap between the primal solution and 
the dual variables. 

Concerning speed, the algorithm in uni works in 0{m) time for m edges, 
except for the rounding. We will modify the simple rounding from [T31 §3.3] to 
make it efficient. The simple rounding from ini §3.3] operates on to subsets 
A and B oi F with ]Aj < k and \B\ > k. First they let each a G A pick its 
nearest un-picked b G B. Afterwards, they pick A: — ]Aj random facilities among 
the remaining \B\ — jAj facilities in B. However, picking the nearest un-picked 
facility is not easy. Instead, we evaluate B once in 0{m) time. For each a G A, 
this gives us the nearest facility G B. We now first pick A^ = {a^ \ a G A] 
and second we pick a random subset of H \ A^ of size k — \A^\. To see that 
the analysis in m §3.3] still goes through, we note that the probability that 
bGB\A^ is picked is (fc - \A^\)/{\B\ - \A^\) < {k - \A\)/{\B\ - \A\). 

Since we do not use the improved rounding in [El §3.5], the overall approx- 
imation factor is worsened by a factor (1 -|- o(l)). In conclusion, we get 

Theorem 2. In 0{m) time we can find S C F, jS”] = k, with cost(S') < (6 -I- 
o(l)) X fc-median-£-cost-'^. 
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To apply this theorem, we construct a graph with 0(n) edges {v,w), each 
with £(v,w) = dist(u,r(;), and with A:^-median-£-costQ^ = 0(fc-mediancost-^) 
where = k + klog^ n. Applying Theorem |2| then gives us a fc^-median S G F 
with cost(S') < cost(3f(5') = 0(fc-mediancost-^). 

The construction of is rather simple based on F. Set d = \o^ n\F\/k = 
log*^^^^ n. For each point x £ P, we include an edge to each of the d nearest 
neighbors in F. 

Lemma 1. The graph G^ has {k + k/ log^ n) -median S C F with i-cost{S) < 
fc-mediancost-^ . 

Proof. All we need is to show the existence of a set D C F of size k / log^ n 
which dominates in the sense that each x, x^ is one of its d nearest neighbors. 
We then set S = OPT U D where OPT is the optimal solution. If is one 

of the d nearest neighbors in F, x^ = Otherwise, x® = x^ and then 

£(x,x^) < dist(x,OPT), so i-cost{S) < cost(OPT). 

To show the existence of D, we just pick D randomly. For each x £ P, the 
probability that none of its d nearest neighbors are picked is < (1— iDj/lFD^^ < 
g-iogn ^ 1/n, so there is a positive probability that this does not happen for 
any x. 

For the construction of G^, we need 

Lemma 2. With high probability, using O(dlogn) evaluations, we can find the 
d nearest neighbors in F to each point x. 

Proof. We pick each a £ F with probability 1 / (2d) for a set Q that we evaluate. 
For each x £ P and each i < d, the probability that x*^ is the Ah nearest neighbor 
in F is > (1 — l/(2d))*“^/2d > l/(4d). Hence, in O(dlogn) evaluations, we can 
find the d nearest neighbors of all x with high probability. 

Theorem 3. With probability 1 — 0{l/n), in 0{m) time, we can construct a 
k + k/log^ n-median S with cost(S') < (12 + o(l)) x fc-mediancost 
Proof. First using Theorem P, we identify a set F of size klog^^^^ n with 
/c-mediancost-'^ < (2 + o(l))A:-mediancost. Then using the above lemmas we 
construct G^ with (/c + fc/ log^ n)-median-Gcostgjf < fc-mediancost-^ and fi- 
nally, we apply Theorem O to G^. 

To get the low error probability, we repeat the above construction 0(log n) 
times, returning the S minimizing cost (5). Note that this could not be done for 
Theorem n because we cannot compute and compare fc-mediancost-^. 

4 Recursing Down to k Facilities 

Let OPT denote some optimal solution to the fc-median problem. Our starting 
point is a solution S, as from Theorem E| of satisfactory cost, that is, cost(S') = 
0{0PT), but which uses q = fc/log^n too many facilities. We will then first 
try greedily to remove q facilities from S. If this cannot be done easily at cost 
o{S), we will be able to fix many universally good facilities and then recurse. 
This recursion is by far the hardest and most technical part of the paper, and a 
descent presentation would take about 8 pages in the current format. For space 
reasons, we defer this to the journal version. 
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5 fc-Center 

We want to pick 5 C f/, [S'] = fc minimizing maxj,gy dist(t>, 5'). A factor 2 
approximation is classical m, and best possible 0, but the natural algorithm 
takes 0{km) time. We get down to 0{m) time, and our methods will be reused 
for facility location. 

The classical factor 2 approximation is the following greedy algorithm: guess 
the optimal distance d* and then return any maximal dist-2d* independent set. 
Here, for any d, a subset U Q V is dist-d independent if no two vertices of U are 
within distance d of each other. We know we have an adequate value of d if it 
gives rise to > fc facilities whereas d — 1 gives rise to < fc facilities, so d may be 
found with a binary search. 

The obvious greedy way of finding a maximal dist-d independent set 17 is as 
follows: set U = % and W = V. While W yf 0, add an arbitrary vertex v € W 
to U and remove all vertices within distance d from W. The complexity of this 
algorithm is 0{\U\m). Here we we will get down to near-linear time. To the best 
of our knowledge, no o(mn) time algorithm existed for finding maximal dist-d 
independent sets. 

We shall use the following result of Cohen P|: 

Lemma 3 (Cohen). For any W Q V , after 0{m) preprocessing, for any vertex 

V and d > 0, in 0(1) time, we can estimate within a factor 1 ± o(l) the number 
of vertices from W within any distance d of v. 

Proposition 1. We can find a maximal dist-d independent set of any set W C 

V in 0(m) time. 

Proof. As in the above greedy algorithm, start by setting U = %. While W yf 0, 
do as follows. Using LemmaEl compute d = (1 ± o(l))max„gw \N<d{v) H W\. 
Pick a random subset i? of W of size |lU|/d. Let T = {v G R\N<d H i? = {'c}}. 
Add T to U and remove all vertices from W within distance d from U. 

To identify the set T, construct a family {Ri}i< 2 \og.^ |fi| of subsets of R such 
that for all v,w G R there is a set Ri containing v but not w. These sets may 
be constructed by associating different log 2 |i?| bit vectors to the vertices in R, 
and then characterizing each set by the value of a certain bit position. Now, 
N<d{v) n i? = {?;} if and only if dist(u, Ri) > d for each Ri not containing v. 

The idea in the above construction is that it within O(logn) rounds reduces 
max„gw \N<d{v) C\W\ by a constant factor. More precisely, we show that if 
for some vertex v G W, |A^<d(t’) fl 1U| > S/2, the next round eliminates v 
with constant probability. Clearly, the condition implies that some vertex u G 
N<d{v) nW will be picked for R with constant probability. Further, by choice of 
S, \N<d{u)r\W\ < (H-o(l))d, so with constant probability, no other vertex from 
u will be picked for R. Hence some u G N<d{v) fl W will end in T, eliminating 

V from W, with constant probability. 

We note that the above proof has some similarities with the randomized parallel 
independent set algorithm in m- However, the algorithm in m accesses all 
edges whereas we do not want to consider the 0{n^) pairs of distance < d. 
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Corollary 2. In 0{m) time, we can find a factor 2 approximation to the k- 
center problem. 

In P] are mentioned several similar problems that can be improved with similar 
methods. 

6 Facility Location 

In the facility location problem for a metric (P, dist), there is a facility cost 
f-cost(cc) associated with each x G P, and then the cost of 5 C P is 
f“COst(/) + Yhxi^p dist(a;, S). First we note that in the distance oracle ver- 
sion of the facility location problem, even if all facility costs are uniform, no 
constant factor approximation is possible with o(n^) queries, not even if we al- 
low randomization. For a negative example, divide into clusters of size t with 
intra- and inter-cluster distance 0 and oo, respectively. Then we expect to need 
Q{n^ /t) queries for an approximation factor substantially below t. It follows that 
Mettu and Plaxton’s m 0{n^) time bound is optimal for facility location with 
distance oracles, even with uniform facility costs. 

However, in jS|, it is shown that Jain and Vazirani’s algorithm uni can be 
implemented with 0{n) nearest neighbor queries, leading to a more efficient 
solution in Hamming space HH. We note that 0 does not give anything for the 
/c-median problem as it is based on approximate counting, hence approximate 
payment of facilities, and then Jain and Vazirani’s rounding trick does not work. 

In a graph, we have no efficient way of supporting individual nearest neighbor 
queries, but what we can do for a subset X C V of the vertices is for all points 
V G V to find their nearest neighbor , that is, in graphs, we can solve the all 
points nearest marked neighbor problem in 0(m) time (c.f. proof of Observation 
. Essentially, we show below that facility location can be solved within a factor 
3 from optimality, with a polylogarithmic number of solutions to the all points 
nearest neighbor problem. We note that whereas “phase 2” of Jain and Vazirani’s 
algorithm uni is trivial to implement with efficient individual nearest neighbor 
queries |5|, it needs something like Proposition ^for graphs. 

Instead of using Jain and Vazirani’s algorithm H31, we use the one of Mettu 
and Plaxton m The factor 3 approximation algorithm of Mettu and Plaxton 
for facility location is very simple and elegant. As the algorithm in |E|, it 
has two phases. 

Phase 1. for each x G P, we find r^ such that value(a;, Tj,) = f-cost(x) where 
value(x,r) = EyGP.dist(a:.y)<r(’' - dist(x, 2 /)). 

Phase 2. Starting with S' = 0, we visit x G P in order of increasing adding 
X to S if dist(x, S) > 2rx- 

Lemma 4 (ESI). The above set S is at most a factor 3 from the optimal solu- 
tion to the facility location problem. 

For an efficient implementation of the above algorithm, let e > 0 be such that 
2 is an integral power of (1 -I- e) for some integer i. Increasing assignment just 
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a little, we will view all distances as rounded up to nearest integral power of 

(1 + £). 

Recall that in our later usage of the Vx, all we care about is which vertices 
are within distance 2tx from x. Let i be such that (1 + eY < < (1 + eY'^^, 

since 2 is an integral power of (1 + e), there are no rounded distances between 
(1 + eY and (1 + Thus, we can freely round Tx down to the nearest power 

of (1 + e), even if this implies value(a:, r^) f-cost(x). 

Algorithm B Implements phase 1 finding r„ for all v G V. 

B.lfor all a gV, set fa = 0 — /o = value(a, (1 + e)®). 

B.2.set U = V — U are facilities with unidentified 

B.S.for i = 1, 2, ... while U 

B.3.1. using LemmaOlwith W = V, estimate for each a G U, the number pa of 
vertices within distance (1 + e)® from a. 

B.3.2. for each a G U, 

B. 3.2.1. if (f-cost(a) - fa) < ((1 + e)®+^ - (1 + e)*)Pa, 

B.3.2. 1.1. setra = (l + e)® 

B.3.2. 1.2. remove a from U 

else 

B. 3.2.1.1. fa = fa + ((1 + e)*+^ - (1 + e)YPa 

Algorithm C Implements phase 2, constructing the set S. 

C. Lset 5 = 0 
C.2.for i = 1,2..., 

C.2.1. let W be the set of vertices a gV with = (1 + e)® 

C.2.2. remove from W all vertices within distance 2(1 + e)® from S 
C.2.3. using Proposition d construct a maximal dist-2(l + e)® independent set 
U from W 
C.2.4. add U to S 

Theorem 4. We can solve the facility location problem within a factor 3 + o(l) 
from optimality in 0(m) time. 

Proof. We have used approximate distances in the sense of rounding up to near- 
est power of (H-£)®, and we used approximate facility costs in the sense that the 
Pa may be off by a factor (1 ± o(l)). The implicit rounding down of the was 
seen above to have no consequence. In phase 2, we made no further approxima- 
tion, so with e = o(l), our total costs are only off by a factor (1 ± o(l)) relative 
to the factor 3 in Lemma d 

The time bound follows from the time bound in Proposition d 
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Abstract. A parameterized problem is fixed parameter tractable if it 
admits a solving algorithm whose running time on input instance (7, k) 
is /(fc) • |7|“, where / is an arbitrary function depending only on k. 
Typically, / is some exponential function, e.g., f(k) = for constant c. 
We describe general techniques to obtain growth of the form /(fc) = 
for a large variety of planar graph problems. The key to this type of al- 
gorithm is what we call the “Layerwise Separation Property” of a planar 
graph problem. Problems having this property include planar vertex 
COVER, PLANAR INDEPENDENT SET, and PLANAR DOMINATING SET. 



1 Introduction 

While many problems of practical interest tend to be intractable from a standard 
complexity-theoretic point of view, in many cases such problems have natural 
“structural” parameters, and practically relevant instances are often associated 
with “small” values of these parameters. The notion of fixed parameter tractabil- 
ity dD] tries to capture this intuition. This is done by taking into account solving 
algorithms that are exponential with respect to the parameter, but otherwise 
have polynomial time complexity. That is, on input instance (7, k) one terms a 
(parameterized) problem fixed parameter tractable if it allows for a solving algo- 
rithm running in time f{k)'nP^^\ where / is an arbitrary function only depending 
on k and n = |7|. The associated complexity class is called FPT. As fixed pa- 
rameter tractability explicitly allows for exponential time complexity concerning 
the parameter, the pressing challenge is to keep the related “combinatorial ex- 
plosion” as small as possible. In this paper, we provide a general framework for 
NP-hard planar graph problems that allows us to go from typically time 
algorithms to time c^rp^^^ algorithms (subsequently briefly denoted by “c'^- 
algorithms”), meaning an exponential speed-upu The main contributions of our 
work, thus, are 

* Supported by the Deutsche Forschungsgemeinschaft (research project PEAL (Para- 
meterized complexity and Exact ALgorithms), NI 369/1-1). 

^ Actually, whenever we can construct a so-called problem kernel of polynomial size 
in polynomial time (which is often the case for parameterized problems), then we 
can replace the term by + rP^^f 



F. Orejas, P.G. Spirakis, and J. van Leeuwen (Eds.): ICALP 2001, LNCS 2076, pp. 261-^^^ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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• to provide new results and a “structural breakthrough” for the parameterized 
complexity of a large class of problems, 

• to parallel and complement results for the approximability of planar graph 
problems obtained by Baker 

• to methodize and extend previous work on concrete graph problems [P , and 

• to systematically compute the bases in the exponential terms. 

Fixed parameter tractability. The parameterized tractability approach tries 
to restrict the seemingly inherent “combinatorial explosion” of NP-hard prob- 
lems to a “small part” of the input, the parameter. For instance, vertex cover 
allows for an algorithm with running time 0{kn + 1.3*), where k is the size of 
the vertex cover to be constructed iHmi. One direction in current research is to 
investigate problems with fixed parameter algorithms of running time 
and to try to get the constant c as small as possible. Getting small constant 
bases in the exponential factor f{k) is also our concern, but here, we focus on 
functions / (asymptotically) growing as slowly as possible. 

Planar graph problems. Planar graphs build a natural and practically impor- 
tant graph class. Many problems that are NP-complete for general graphs (such 
as VERTEX COVER and DOMINATING SEt) remain so when restricted to planar 
graphs. Whereas many NP-complete graph problems are hard to approximate 
in general graphs, Baker, in her well-known work 0, showed that many of them 
possess a polynomial time approximation scheme for planar graphs. However, 
the degree of the polynomial grows with the quality of the approximation. Al- 
ternatively, finding an “efficient” exact solution in “reasonable exponential time” 
is an interesting and promising research challenge. 

Relations to previous work. In recent work, algorithms were presented that 
constructively produce a solution for planar dominating set and related prob- 
lems in time c^n To obtain these results, it was proven that the treewidth 
of a planar graph with a dominating set of size k is bounded by 0{Vk), and that 
a corresponding tree decomposition can be found in time 0{'/kn). Building on 
that problem-specific work with its rather tailor-made approach for dominating 
sets, here, we take a much broader perspective. From a practitioner’s point of 
view, this means that, since the algorithms developed here can be stated in a 
very general framework, only small parts have to be changed to adapt them 
to the concrete problem. In this sense, our work differs strongly from research 
directions where running times of algorithms are improved in a very problem- 
specific manner (e.g., by extremely sophisticated case-distinctions, as in the case 
of VERTEX COVER for general graphs). For example, once one can show that a 
problem has the so-called “Layerwise Separation Property,” one can run a gen- 
eral algorithm which quickly computes a tree decomposition of guaranteed small 
width (independent of the concrete problem). 

Results. We provide a general methodology for the design of c'^-algorithms. 
A key to this is the notion of select&verify graph problems and the introduc- 
tion of the Layerwise Separation Property (see Section Oj) of such problems in 
connection with the concept of linear problem kernels (see Subsection 12. 1 II . We 
show that problems that have the Layerwise Separation Property and admit ei- 
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Fig. 1. Roadmap of our methodology for planar graph problems. 

ther a tree decomposition based algorithm (c£, e.g., CZl) or admit an algorithm 
based on bounded outerplanarity (cf. P]), can be solved in time For 

instance, these include planar vertex cover, planar independent set, 
PLANAR DOMINATING SET, Or PLANAR EDGE DOMINATION and also variations 
of these, such as their weighted versions. Moreover, we give explicit formulas to 
determine the base c of the exponential term with respect to the problem specific 
parameters. For planar vertex cover, e.g., we obtain a time algo- 

rithm. The methods can be generalized in a way that basically all FPT-problems 
that admit tree-decomposition based algorithms can be attacked with our ap- 
proach. A library containing implementations of various algorithms sketched in 
this paper is currently under development. It uses the LED A package [E| for 
graph algorithms and the results obtained so far are encouraging. 

Review of presented methodology. In a first phase, one separates the graph 
in a particular way ( “layerwise” ) . The key property of a graph problem which 
allows such an approach will be the so-called “Layerwise Separation Property.” 
Corresponding details are presented in Section 01 It will be shown that such a 
property holds for quite a large class of graph problems. In a second phase, the 
problem is solved on the layerwisely separated graph. We present two indepen- 
dent ways to achieve this in Section ^ either using the separators to set up a 
tree decomposition of width 0(\/fc) and solving the problem using this tree de- 
composition, or using a combination of a trivial approach on the separators and 
some algorithms working on graphs of bounded outerplanarity (see 0) for the 
partitioned rest graphs. Figure Ogives a general overview of our methodology. 

Several details and proofs had to be deferred to the full version 0 . 

2 Basic Definitions and Preliminaries 

We consider undirected graphs G = (V,E), V denoting the vertex set and E 
denoting the edge set. Sometimes we refer to V by V{G). Let G[D] denote 
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the subgraph induced by a vertex set D. We only consider simple (no double 
edges) graphs without self-loops. We study planar graphs, i.e., graphs that can 
be drawn in the plane without edge crossings. Let (G, (p) denote a plane graph, 
i.e., a planar graph G together with an embedding p. A face of a plane graph is 
any topologically connected region surrounded by edges of the plane graph. The 
one unbounded face of a plane graph is called the exterior face. We study the 
following “graph numbers” : A vertex cover G of a graph G is a set of vertices 
such that every edge of G has at least one endpoint in G; the size of a vertex cover 
set with a minimum number of vertices is denoted by vc{G). An independent set 
of a graph G is a set of pairwise nonadjacent vertices; the size of an independent 
set with a maximum number of vertices is denoted by is{G). A dominating set 
D of a, graph G is a set of vertices such that each of the vertices in G lies 
in D or has at least one neighbor in D; the size of a dominating set with a 
minimum number of vertices is denoted by ds(G). The corresponding problems 
are (planar) vertex cover, independent set, and dominating set. 



2.1 Linear Problem Kernels 

Reduction to problem kernel is a core technique for the development of fixed 
parameter algorithms (see nm). In a sense, the idea behind is to cut off the 
“easy parts” of a given problem instance such that only the “hard kernel” of 
the problem remains, where, then, e.g., exhaustive search can be applied (with 
reduced costs). 

Definition 1. Let C be a parameterized problem, i.e., C consists of pairs (I,k), 
where problem instance I has a solution of size k (the parameter)^ Reduction 
to problem kernel^ then means to replace problem (J, k) by a “reduced” problem 
{!' ,k') (which we call the problem kernel^ such that k' < c - k and \I'\ < q{k') 
with constant c, polynomial q, and (I,k) £ C iff {!' ,k') S C. Furthermore, we 
require that the reduction from (I,k) to {!' ,k') (that we call kernelization^ is 
computable in polynomial time Tx{\I\, k). 

Usually, having constructed a size problem kernel in time one 

can improve the time complexity f{k)n^^^^ of a fixed parameter algorithm to 
f{k)k^^^'> + Subsequently, our focus is on decreasing /(fc), and we do 

not always refer to this simple fact. Often (cf. the subsequent example ver- 
tex cover), the best one can hope for the problem kernel is size linear in k, 
a so-called linear problem kernel. For instance, using a theorem of Nemhauser 
and Trotter uni, Chen et al. recently observed a problem kernel of size 2k 
for VERTEX COVER On general (not necessarily planar) graphs. According to the 
current state of knowledge, this is the best one could hope for. As a further 
example, note that due to the four color theorem for planar graphs and the 

^ In this paper, we assume the parameter to be a positive integer, although, in general, 
it might also be from an arbitrary language (e.g., being a subgraph). 

^ Here, we give a somewhat “restricted definition” of reduction to problem kernel 
which, however, applies to all practical cases we know. 
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corresponding algorithm generating a four coloring m. it is easy to see that 
PLANAR INDEPENDENT SET has a problem kernel of size 4/c. 

Besides the positive effect of reducing the input size significantly, this pa- 
per gives further justification, in particular, for the importance of linear prob- 
lem kernels. The point is that once we have a linear problem kernel, e.g., for 
PLANAR VERTEX COVER Or PLANAR INDEPENDENT SET, it is fairly easy to get 
c'^-algorithms for these problems based upon the famous planar separator the- 
orem m The constant factor in the problem kernel size directly influences the 
value of the exponential base and hence, lowering the kernel size means improved 
efficiency. We will show alternative, more efficient ways (without using the pla- 
nar separator theorem) of how to make use of linear problem kernels in a generic 
way in order to obtain c’^-algorithms for planar graph problems. 

2.2 Tree Decomposition and Layer Decomposition of Graphs 

Definition 2. A tree decomposition of a graph G = {V,E) is a pair {{Xi \ i G 
/},T), where Xi Q V is called a bag and T is a tree with the elements of I as 
nodes, such that the following hold: 

1 - = 

2. for every edge {m, u} G E, there is an i G I such that {w,f} C Xi; 

3. for all i,j,k G I, if j lies on the path between i and k in T, then XiAX^ C Xj . 

The width of {{Xi \ i G I},T) is max{|Xi| \ i G 1} — 1. The treewidth tw(G) 
of G is the minimum I such that G has a tree decomposition of width £. 

Details on tree decompositions can be found in KI6111I . Let G = {V,E) be a 
planar graph. The vertices of G can be decomposed according to the level of the 
“layer” in which they appear in an embedding (p, see PEI 
Definition 3. Let (G = {V,E),(j)) be a plane graph. 

a) The layer decomposition of (G, 4>) is a disjoint partition of the vertex set V 
into sets Li, . . . ,Lr, which are recursively defined as follows: 

• Li is the set of vertices on the exterior face of G, and 

• Li is the set of vertices on the exterior face of G\V — U}=i ^j\ * = 

2, ...r. 

We will denote the layer decomposition of (G, p) by C{G, (f>) := (Li, . . . , Lr). 

b) The set Li is called the ith layer of (G, p). 

c) The (uniquely defined) number r of different layers is called the outerpla- 
narity of {G,4>), denoted by ont{G,(j)) := r. 

d) We define out(G) to be the smallest outerplanarity possible among all plane 
embeddings, i.e., minimizing over all plane embeddings p of G we set 

out(G) := minout(G, (ji). 

4 > 



Proposition 1 (PQ). Let (G = {V,E),(j)) be a plane graph. The layer decom- 
position C{G,4>) = (Li,... ,Lr) can be computed in time 0(|U|). 
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2.3 Algorithms Based on Separators in Graphs 

One of the most useful algorithmic techniques for solving computational prob- 
lems is divide-and-conquer. To apply this technique to planar graphs, we need 
graph separators and related notions. 

Graph separators and select&verify problems. Graph separators are de- 
fined as follows. Let G — {V, E) be an undirected graph. A separator S C V 
of G partitions V into two sets A and B such that A U S U S' = V with 
Ani3 = AnS = i?nS = 0 and no edge joins vertices in A and B. In general, 
of course. A, B and S will be non-empty. 

Definition 4. A set Q of tuples (G,k), G an undirected graph with uertex set 
V = {ui, . . . ,Vn} and k a positiue real number, is called a select&verify graph 
problem if there exists a pair (P.,opt) with opt G {min, max}, such that P. is 
a function that assigns to an undirected graph G ( with n vertices ) a polynomial 
time computable function Pq '■ (0, 1}" — )> K+ U |±oo}, such that 



(G, k)GQ ^ 



f opt,^6{o,i}n Pg(x) < k 
I optcDslo,!}" Pg{.x) > k 



if opt = min 
if opt = max . 



It is an easy observation that every select&verify graph problem that additionally 
admits a linear problem kernel of size dk is solvable in time 0{2‘^’^k + Txin, k)). 

Vertex cover is an easy example for a select&verify graph problem. Here, 
for G = (y, E), one may use (with the convention oo • 0 = 0) 



Pg{x) = 



Xi 



i=i 



{vi,Vj}^E 



OO • (1 — Xi){l — Xj). 



Algorithms based on separator theorems. Lipton and Tarjan H2I have used 
their famous separator theorem in order to design algorithms with a running time 
of 0(c'/") for certain select&verify planar graph problems. This naturally implies 
that, in the case of parameterized planar graph problems for which a linear kernel 
is known, algorithms with running time 0{c'^ + Tx{n, k)) can be derived. As 
worked out in |2|, a straightforward application yields very bad constants, even 
when dealing with improved versions of the planar separator theorem (see 0); 
for instance, c' = pc 40000 for planar vertex cover. We will see 

algorithms with much better constants in this paper. In addition, the advantages 
of the approach pursued in this paper also lie in weaker assumptions. In some 
cases, we may drop requirements such as linear problem kernels by replacing it 
with the so-called “Layerwise Separation Property,” a seemingly less restrictive 
demand. 



3 Phase 1: Layerwise Separation 



We will exploit the layer-structure of a plane graph in order to gain a “nice” 
separation of the graph. It is important that a “yes” -instance (G, k) (where G is 
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a plane graph) of the graph problem Q admits a so-called “layerwise separation” 
of small size. By this, we mean, roughly speaking, a separation of the plane graph 
G (i.e., a collection of separators for G), such that each separator is contained 
in the union of constantly many subsequent layers (see conditions 1 and 2 of 
the following definition). For (fixed parameter) algorithmic purposes, it will be 
important that the corresponding separators are “small” (see condition 3 below) . 
Definition 5. Let {G = (V,E),(f>) be a plane graph of outerplanarity r := 
out(G, (/>) and let L{G,4>) = (Li,... ,Lr) he its layer decomposition. A layer- 
wise separation of width w and size s of {G,4>) is a sequence ,Sr) of 

subsets of V, with the properties that: 

L Si C Lj, 2,. Si separates layers and Li+w, and 3. I'^il ^ 

Definition 6. A parameterized problem Q for planar graphs is said to have 
the Layerwise Separation Property (abbreviated by: LSP) of width w and size- 
factor d if for each (G, k) € G and every planar embedding 4> of G, the plane 
graph (G, (/)) admits a layerwise separation of width w and size dk. 

3.1 How Can Layerwise Separations Be Obtained? 

The Layerwise Separation Property can be shown directly for many parameter- 
ized graph problems. As an example, consider planar vertex cover. Here, 
we get constants w = 2 and d = 2. In fact, for (G, k) € vertex cover (where 
(G, (/>) is a plane graph) with a “witnessing” vertex cover V' of size fc, the sets 
Si := (LiULi+i)nP' form a layerwise separation, given the layer decomposition 
L{Gy) = (Li,... ,L,). In P, the non-trivial fact is proven that for planar 
DOMINATING SET, the LSP holds with constants w = 3 and d = 51. 

Lemma 1. Let G be a parameterized problem for planar graphs that admits 
a problem kernel of size dk. Then, the parameterized problem G' where each 
instance is replaced by its problem kernel has the LSP of width 1 and size- 
factor d. 

With Lemma P and the size 2k problem kernel for vertex cover (see Sub- 
section we derive, for example, that planar vertex cover has the LSP 
of width 1 and size-factor 2 (which is even better than what was shown above) . 
Using the ik problem kernel for planar independent set, we see that this 
problem has the LSP of width 1 and size-factor 4 on the set of reduced instances. 

3.2 What Are Layerwise Separations Good for? 

The idea of the following is that, from a layerwise separation of small size (say 
bounded by 0{k)), we are able to choose a set of separators such that their size 
is bounded by 0{Vk) and — at the same time — the subgraphs into which these 
separators cut the original graph have outerplanarity bounded by 0{Vk). 
Definition 7. Let (G = {V,E),(jf) be a plane graph with layer decomposition 
C{G, 4>) = (Li, . . . , Lr). A partial layerwise separation of width w is a sequence 
S = {Si , . . . , Sq) such that there exist io = 1 < ii <...< iq < r = iq+i such 
that for * = !,... , 



By default, we set Si := 0 for i < 1 and i > q. 
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s, s Utt'”"’ 

2. ij + w < ij+i (so the sets in S are pairwise disjoint), and 

3. Sj separates layers and Li^j^yj. 

The sequence Cs = (Go, . . . , Gq) with 

ij + l + {w-l) 

G, :=G[( U L,)-(5,U5,+i)], j = 0, . . . , g 



is called the sequence of graph chunks obtained by S. 

Theorem 1. Let (G = (V,E),(j)) be a plane graph that admits a layerwise sep- 
aration of width w and size dk. Then, for every if € M+, there exists a partial 
layerwise separation S(ip) of width w such that 

1. maxseS(ip) |> 5 '| < ifVdk and 

2. out{H) < + w for each graph chunk H in Cs{^)- 

Moreover, there is an algorithm with running time 0{'/kn) which, for given ip, 
recognizes whether (G, 4>) admits a layerwise separation of width w and size dk 
and, if so, computes S{ip). 

Proof. (Sketch) For m = 1, . . . ,w, consider the integer sequences /^ = (w + 
and the corresponding sequences of separators Sm = Note 

that each Sm is a sequence of pairwise disjoint separators. Since {Si, . . . , Sr) is 
a layerwise separation of size dk, this implies that there exists a 1 < m' < re 

with 1^*1 ^ t (*)• 

For a given if, let s := if'/dk. Define S{if) to be the subsequence of Sm' such 
that jS”! < s for all S £ S{if), and [S'! > s for all S £ Sm' — S{if). This yields 
condition 1. As to condition 2, suppose that S{if) = (iSi^, . . . , Si^). The number 
of separators in Sm' that appear between Si. and is {ij+i — ij)lw. Since 

all of these separators have size > s, their number has to be bounded by dk/ws, 
see (*). Therefore, ij+i — ij < y/dk/if for all j = 1, . . . , g — 1. Hence, the chunks 
~ i^ij tJ have outerplanarity at most y/dk/if + w. 

The proof can be turned into a constructive algorithm. This is outlined in 
the full version |2I . □ 

4 Phase 2: Algorithms on Layerwisely Separated Graphs 

After Phase 1, we are left with a set of disjoint (layerwise) separators of size 
0{'/k) separating the graph in components, each of which having outerplanarity 
bounded by 0{Vk). 
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4.1 Using Tree Decompositions 

We will show how the existence of a layerwise separation of small size helps to 
constructively obtain a tree decomposition of small width. The following result 
can be found in jHl Theorem 83] and PJ Theorem 12]. 

Proposition 2. For a plane graph {G,(p), we have tw(G) < 3-out(G) — 1. Such 
a tree decomposition can be found in 0(out(G) • n) time. 

Theorem 2. Let (G, (f>) be a plane graph that admits a layerwise separation of 
width w and size dk. Then, we have tw(G) < 2\/ 6dk + (3ic — 1). Such a tree 
decomposition can be computed in time 0{k^^^n). 

Proof. (Sketch) By Theoremni for each ip G M+, there exists a partial layerwise 
separation S{ip) = {Si,... ,Sq) of width w with corresponding graph chunks 
^S{ip) = (Co, • ■ • , Gq), such that maxs^sdi) |*S'| < ipy/dk and out(Gi) < Vdk/ip+ 
w for i = 0, . . . ,q. The algorithm that constructs a tree decomposition is: 

1. Construct a tree decomposition Xi of width at most 3out(Gi) — 1 for each 
of the graphs Gi (using the algorithm from Proposition 0 . 

2. Add Si and Si+i to every bag in (i = 0, . . . , q). 

3. Let Ti be the tree of A^. Then, successively add an arbitrary connection 
between the trees Ti and in order to obtain a tree T. 

The tree T, together with the constructed bags, gives a tree decomposition of G, 
see PJ Prop. 4]. Its width tw(X,/,) is upperbounded by {2ip + 3/ip)'/dk+{3w — l), 
which is minimal if ip = \fij2. Therefore, tw(X^) < 2i/6dk + {fiw — 1). □ 

For example, Theorem0and previous observations imply tw(G) < 4-\/3 vc{G)+5 
and tw(G) < 6^/34 ds{G) + 8 for planar graphs G. Note that for general graphs, 
no relation of the form tw(G) < f{ds{G)) (for any function /) holds. For vertex 
COVER, only the linear relation tw(G) < vc(G) can be shown easily. 

In addition. Theorem 0 yields a c'^-algorithm for certain graph problems. 

Theorem 3. Let Q be a parameterized problem for planar graphs. Suppose that 
Q has the LSP of width w and size-factor d and that there exists a time a^n 
algorithm that decides (G, k) G Q , ifG is given together with a tree decomposition 
of width £. 

Then, there is an algorithm to decide {G,k) G G in time 0(cr^™“^ • 
2 Si{<T, w/iere 9i{a,d) = 2 log(cr)\/6d. 

Proof. In time 0{'/kn) (see Theorem Q]), we can check whether an instance 
(G, k) admits a layerwise separation of width w and size dk. If so, the algorithm 
of Theorems computes a tree decomposition of width at most 2-\/6dfc-|- (3ic — 1), 
and we can decide (G, k) G G hy using the given tree decomposition algorithm in 
time If {G,k) does not admit such a layerwise separation, 

we know that (G, k) ^ G, by definition of LSP. □ 



270 



J. Alber, H. Fernau, and R. Niedermeier 



Going back to our running examples, it is well-known that planar vertex 
COVER and PLANAR INDEPENDENT SET admit such a tree decomposition based 
algorithm for cr = 2. For planar vertex cover, we have seen that the LSP of 
width 1 and size-factor d = 2 holds. Hence, Theorem^ guarantees an Q(24%/3fc 

n) 

algorithm for this problem. For planar independent set, we have a linear 
problem kernel of size 4fc, hence, the LSP of width 1 and size-factor d = 4 holds, 
which yields an 0{2'^'^n) algorithm. 



4.2 Using Bounded Outer planarity 

We now turn our attention to select&verify problems subject to the assumption 
that a solving algorithm of linear running time on the class of graphs of bounded 
outerplanarity exists. This issue was addressed in a variety of examples can 
be found therein. We examine how, in this context, the notions of select&verify 
problems and the LSP will lead to c'^-algorithms. 

Due to the lack of space, we only give an intuitive explanation of the notions 
“weak glueability” and “constraint C/” associated to a select&verify problem Q 
which appear in the formulation of the following results. For a more detailed 
definition we refer to the long version |2] or to P| . A problem Q is weakly glueable 
with A colors if a solution of Q on an instance G can be obtained by “merging” 
solutions of CONSTRAINT Q with G[AUS'] and G[HUS'], where S separates G into 
two parts A and B. Here, constraint ^ is a variant of Q, in which it is already 
fixed which vertices of S belong to an admissible solution. The number A, in 
some sense, measures the complexity of the merging step. For example, planar 
VERTEX COVER, and PLANAR INDEPENDENT SET are weakly glueable with A = 2 
colors and, planar dominating set is weakly glueable with “essentially” A = 3 
colors. 

Similar to Theorem 21 we construct a partial layerwise separation with 
optimally adapted trade-off parameter ip to enable an efficient dynamic pro- 
gramming algorithm. We omit the proof of the following theorem (see P] for 
details). 

Theorem 4. Let Q be a select&verify problem for planar graphs. Suppose that 
Q has the LSP of width w and size-factor d, that Q is weakly glueable with A 
colors, and that there exists an algorithm that solves the problem constraint 
Q for a given graph G in time 

Then, there is an algorithm to decide {G,k) € Q in time 0 {t'^ 
where 02(A,r, d) = 2-\/2dlog(A) log(r). 

It remains to say for which problems there exists a solving algorithm of the 
problem constraint Q for a given graph G in time For planar 

VERTEX COVER, we have d = 2, w = 1 and r = 8 (see the result of Baker ^ which 
can be adapted to the constraint case fairly easily) and, hence, the approach in 
Theorem 21 yields an 0(2^'^n) time algorithm. 

As an alternative to Baker, we again may use tree decomposition based ap- 
proaches: Let Q he a, parameterized problem for planar graphs. Suppose that 
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there exists a time a^n algorithm that solves constraint Q, when G is given 
together with a tree decomposition of width £. Then, due to Proposition |2| there 
is an algorithm that solves constraint Q in time for t = a^. 

The following easy corollary helps comparing the approach from Subsec- 
tion 14.11 ('i.e.. Theorem|3) with the approach in this subsection (i.e., Theorem Q. 

Corollary 1. Let Q be a seleet&verify problem for planar graphs. Suppose that 
Q has the LSP of width w and size-faetor d, that Q is weakly glueable with X 
eolors, and that there exists a time a^n algorithm that solves constraint Q for 
a graph G, if G is given together with a tree deeomposition of width £. 

Then, there is an algorithm to decide (G, k) G G in time 0{a^'" , 
where 9'i{X,a,d) = 2-y6dlog(A) log(cr). 

The exponential factor of the algorithm in CorollaryQ i.e., 9^{\, tr, d), is related 
to the corresponding exponent of TheoremEl i.e., 0i(cr, d), in the following way: 
\/log A • 01 (cr, d) = yiog cr • 03 (A, CT, d). From this, we derive that, if A > cr, the 
algorithm in Theorem 0 outperforms the one of Corollary [Q whereas, if A < cr, 
the situation is vice versa. However, in order to apply Corollary Ql we need the 
three extra assumptions that we have a seleet&verify problem which is weakly 
glueable and that we can deal with the problem constraint Q in the treewidth 
algorithm. 

5 Conclusion 

To some extent, this paper can be seen as the “parameterized complexity coun- 
terpart” to what was developed by Baker ^ in the context of approximation 
algorithms. We describe two main ways (namely linear problem kernels and 
problem-specific approaches) to achieve the novel concept of Layerwise Sep- 
aration Property, from which again, two approaches (tree decomposition and 
bounded outerplanarity) lead to c'^-algorithms for planar graph problems (see 
Figure ^for an overview). A slight modification of our presented techniques can 
be used to extend our results to parameterized problems that admit a problem 
kernel of size p(k) (not necessarily linear!). In this case, the running time can 

be sped up from to (see |2j for details). Basically 

all FPT-problems that admit treewidth based algorithms can be handled by our 
methods (see pT)!. 

Future research topics raised by our work include to further improve the ( “ex- 
ponential”) constants, e.g., by a further refined and more sophisticated “layer 
decomposition tree” ; to investigate and extend the availability of linear problem 
kernels for all kinds of planar graph problems; to provide implementations of our 
approach accompanied by sound experimental studies, thus taking into account 
that all our analysis is worst case and often overly pessimistic. Finally, a more 
general question is whether there are other “problem classes” that allow for c'^ 
fixed parameter algorithms. Cai and Juedes [7], however, very recently showed 
the surprising result that for a list of parameterized problems (e.g., vertex 
COVER on general graphs) c°^^^-algorithms are impossible unless FPT = W[V\. 
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Abstract. It is shown that for essentially all MAX SNP-hard optimiza- 
tion problems finding exact solutions in subexponential time is not possi- 
ble unless W[l] = FPT. In particular, we show that pin)) param- 

eterized algorithms do not exist for Vertex Cover, Max Cut, Max 
C-Sat, and a number of problems on bounded degree graphs such as 
Dominating Set and Independent Set, unless W[l] = FPT. Our re- 
sults are derived via an approach that uses an extended parameterization 
of optimization problems and associated techniques to relate the parame- 
terized complexity of problems in F FT to the parameterized complexity 
of extended versions that are VE[l]-hard. 



1 Introduction 

Recent substantial progress has been made in building better and better param- 
eterized algorithms for a variety of NP-complete problems. Consider the problem 
of determining whether a graph with n nodes has a Vertex Cover of size k. 
Starting with the early work of Buss (Z) who discovered a + kn) algo- 

rithm for the problem, the running time of parameterized algorithms for Ver- 
tex Cover has been improved to 0{2^k^ + kn) by Downey and Fellows ^1], 
0(1.325^fc^ + kn) by Balasubramanian et al. [S|, 0(1.3196^fc^ -|- kn) by Downey, 
Fellows, and Stege [El, 0(1 ■29175^fc^ + kn) by Niedermeier and Rossmanith 
ini, and 0(1.271^fc^ -|- kn) by Chen, Kanj, and Jia Similar improvements 
have been made for other NP-complete problems |2|. In particular, we mention 
the case for Planar Dominating Set. As shown by Downey and Fellows HH, 
this problem is known to be fixed parameter tractable via a 0(11^ |G|) algorithm. 
However, this result was recently improved to 0(2*^*^ by Alber, Bodlaender, 
Fernau, and Niedermeier 

Noting the progress on algorithms for Planar Dominating Set, it is nat- 
ural to ask if similar progress can be made for Vertex Cover. In particular, it 

* This work was supported in part by the National Science Foundation research grant 
CCR-000248. 
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is natural to ask if the current 0{2^^p{n)) upper bound on Vertex Cover can 
be improved to As our main result, we show that this is unlikely 

since the existence of such an algorithm implies that the IV-hierarchy collapses 
at the first level, i.e., W[l] = FPT. With this in mind, it is natural to ask why 
is it possible to build a parameterized algorithm for Planar Dominating Set 
that runs in time 0{2°^^'> p{n)) when the existence of such an algorithm for Ver- 
tex Cover implies that W[l] = FPT. The answer to this question seems to 
lie in the approximability of these two problems. While both of these problems 
are NP-complete it is known that Planar Dominating Set has a PTAS 
P). The same is not true for Vertex Cover unless P = NP because Vertex 
Cover is MAX SNP-hard PH]- As we show here, the fact the Vertex Cover 
is MAX SNP-hard means that it does not have a subexponential parameterized 
algorithm unless W[l] = FPT. 

Our results are obtained using new parameterized proof techniques. In par- 
ticular, we examine generalized parameterizations of optimization problems and 
relate the complexities of various parameterizations. For each maximization 
problem 7T, we define ijC.®) to be the parameterized problem that determines 
whether OPTn{I) > r{I) + fcs(|/|), for functions r and s. Analogous parameter- 
izations are defined for minimization problems. As we show here, the parameter- 
ized complexity of these problems depends largely on the function s. We show 
that for certain optimization problems II such as Max C-Sat, is param- 

eterized tractable when s = 1 or even o(logn), but ijC.®) becomes W[l]-hard 
when s = 6*(logn). 

This extended abstract is structured as follows. In section 2, we provide 
necessary notation concerning parameterized complexity theory, and we intro- 
duce a general framework for examining parameterized versions of optimization 
problems. In section 2, we begin to examine the relationships among the prob- 
lems for various functions r and s. In Theorem [I] we show that if 

is computable in time 0{2°^^'>p{n)), then IJp’^°s'^) jg parameterized tractable. 
In section 3, we examine the parameterized tractability of problems in MAX 
SNP. In Theorem |3 we show that if some MAX SNP-hard problem 7Ti has 
a 0{2°^^^ p{n)) parameterized algorithm, then every problem II 2 in MAX SNP 
has a 0{2°^^'> q{n)) parameterized algorithm. In Theorem we examine the 
complexity of Max C-Sat*^’’’®) for the function r{4>) = r'm, where r' is a ra- 
tional number and m is the number of clauses in (f). We show that if Max 
C-Sat has a 0{2°^^'>p{n)) parameterized algorithm, then Max C-Sat^’’’^) has a 
0{2°^^'> q{n)) parameterized algorithm. In section 4, we prove our main techni- 
cal result, namely, that Max C-Sat*^’'’^°s") is VF[l]-hard. Note that many short 
proofs are omitted from this extended abstract. 

Combining the results from sections 2, 3, and 4 gives the main result of 
this work. Consider the hypothesis that some MAX SNP-hard problem II has 
a 0(2°^^^p(n)) parameterized algorithm. Since Max C-Sat is in MAX SNP, 
this hypothesis implies that Max C-Sat has a 0(2°(^^gi(n)) parameterized al- 
gorithm. By Theorem ^ it follows that Max c-Sat^’’’^^ has a 0(2°^^^ q 2 {n)) 
parameterized algorithm. By an application of Theorem ^ we have that Max 
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is parameterized tractable. Since Max C-Sat*^’'’^°s") is VP[l]-hard 
by Theorem 0 this implies that W[l] = FPT. 

We note that earlier work by Impagliazzo, Paturi, and Zane uni also indicates 
that Vertex Cover and other NP-complete problems likely do not have subex- 
ponential parameterized algorithms. In particular, their work defines a notion of 
completeness under “SERF” (sub-exponential reduction family) reductions for 
the syntactic class SNP that was originally defined by Papadimitriou and Yan- 
nakakis m- As shown there, the existence of a subexponential-time algorithm 
for any problem that is SNP-hard under SERF reductions implies that every 
problem in SNP has a subexponential-time algorithm. In their work, many NP- 
complete problems, including Vertex Cover, Independent Set, and 3-Sat, 
where shown to be SNP-hard under SERF reductions. 

To compare our work with this earlier work, consider again the case for Ver- 
tex Cover. Since k is bounded above by n, the existence of a subexponential- 
time parameterized algorithm for Vertex Cover implies the existence of a 
subexponential-time algorithm for the usual decision version. Hence, the under- 
lying hypothesis of our work is stronger than that of Impagliazzo, Paturi, and 
Zane US!. However, our conclusion is also stronger. As shown in Corollary 17.7 
of [I2|, if W[l] = FPT then 3-SAT has a subexponential-time algorithm. Since 
3-SAT is SNP-complete under SERF reductions fSl? this implies that every 
problem in SNP has a subexponential-time algorithm. It is not known if the 
converse is true. 



2 Preliminaries 

We begin by introducing necessary concepts concerning optimization problems 
and the theory of parameterized complexity. For additional information, we refer 
readers to the comprehensive text on parameterized complexity by Downey and 
Fellows H21 and the classic text on NP-completeness by Carey and Johnson m- 

To begin, a parameterized problem 77 is defined over the set E* x N, where E 
is a finite alphabet and N in the set of natural numbers. Therefore, each instance 
of the problem 77 is a pair (7, k), where k is called the parameter. A problem FI 
is parameterized tractable if there is an algorithm running in time 0(f{k)p{\I\)) 
that solves the parameterized problem 77 for some polynomial p and some recur- 
sive function /. The complexity class FPT contains all parameterized tractable 
problems. 

The theory of parameterized complexity defines a variety of reductions that 
preserve parameterized tractability. Here we employ the standard parameterized 
m-reduction. Briefly, 77i is parameterized reducible to II 2 if there is a function 
g : N — )> N and / : V* x N — >■ V* such that {x,k) G 77i o {f{x,k),g{k)) G 
II 2 and f{x,k) is computable in time g{k)p{\x\) for some polynomial p. Based 
on this reduction, a hierarchy of increasingly difficult parameterized problems, 
FPT C W[l] C W[2] C . . .W[P], can be defined. This is the W-hierarchy. 
A problem 77 is W[7]-hard if every problem in W[t] is parameterized reducible 
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to n and M^[t]-complete if it is also in W\t\. Our work relies on the fact that 
Independent Set is kk[l]-complete |T7)] . 

Many parameterized problems are obtained from optimization problems via 
parameterizations. Following the earlier work of Cai and Chen |S|, we use a stan- 
dard parameterization of optimization problems. For each optimization problem 
n, the standard parameterized version of U is to determine, given an instance 
I of n and an integer k, whether the optimal solution cost OPTn{I) is > k for 
maximization problems or < k in the case of a minimization problems. 

As pointed out by Majahan and Raman [Ej, OPTjj{I) is always large for 
certain problems such as Max Cut and Max Sat. In these cases, the question 
of whether OPTn{I) > k is trivial for small values of k. To overcome this diffi- 
culty, they suggest that for problems Max Sat and Max Cut, parameterized 
problems should be defined to determine whether OPT{I) > |"^] -|- fc. In this 
paper, this formulation is extended to OPTn{I) > r{I) + A:s(|J|), for arbitrary 
functions r and s. Note that the formulation of parameterized Max Sat by Ma- 
jahan and Raman can be achieved by using r{(j)) = |"^], where m is the number 
of clauses in the boolean formula (p. 

Definition 1. Let II be a maximization problem with instances In and an op- 
timal cost function OPTn- For functions r : In ^ Q and s : N — >■ Q, the 
parameterized problem TjC.®) ig defined as follows: Given an instance (/, k), de- 
termine whether OPTn{I) > r{I) -\- ks{n), where n = |/|. 

7j(ds) 

is called an extended parameterized version of II . When r{n) = 0 and 
s{n) = 1, is called the standard parameterized version of U . We often 

use n to denote when our intention is clear from the context. 

Parameterized versions of minimization problems can be defined in a similar 
fashion. In the literature, most parameterized tractability proofs involve explicit 
constructions of a solution to witness each positive answer. In particular, the 
following “stronger” definition of parameterized tractability was introduced in 

0 . 

Definition 2. Let II be a maximization problem. The parameterized problem 
TjC.s) 

is parameterized tractable with witness if there is a 0{f{k)p{\I\)) algo- 
rithm that determines the membership of (/, k) in ijC.®) and also produces a so- 
lution toll that witnesses to OPTn{I) > r{I)-\-ks{\I\) whenever (I,k) G 
for some recursive function f and polynomial p . 

Although we primarily use the standard definition of parameterized tractabil- 
ity throughout this extended abstract. Definition El is used in section 3 to show 
that L reductions preserve parameterized tractability. The fact that L reduc- 
tions preserve parameterized tractability with witness is crucial to our main 
results. Note, however, that the close relationship between search and decision 
means that the terms parameterized tractable and parameterized tractable with 
witness are equivalent in many cases. As explained in sections 4 and 5, this re- 
lationship allows us to state our main results without reference to the witness 
characterization. 
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To begin, we mention some properties of the parameterized problems 
First note that s can be as large as o(logn) without significantly changing the 
parameterized complexity of 77^’'’®^. Consider the following technical lemma. 

Lemma 1. A parameterized problem is parameterized tractable if it is solvable 
in steps for some unbounded and nondecreasing function s{n) = 

o(log n) . 

Lemma pleads immediately to the following theorem. 

Theorem 1. Let 77 ^’'^) ^ parameterized problem solvable in 0{2^^^^p{n)) 

steps for some p polynomial. Then for any unbounded nondecreasing function 
s{n) = o(logn), 77^’’’®) is parameterized tractable. 

It is natural to ask whether the above theorem holds when s = 0(logn). 
As we show in section 4, this is unlikely since it implies that W[l] = FPT. 
Indeed, 77 ('’I°g”) appears to be parameterized intractable for certain problems. 
Furthermore, the parameterized intractability of 77*^ implies a strong lower 
bound on the running times of parameterized algorithms for _ This is one 

of the keys to our overall approach. 

Theorem 2. 7/77^’’’^) is solvable in 0{2°^^'>p{n)) steps, then is param- 

eterized tractable. 

3 Parameterized Tractability of MAX SNP 

As noted in a number of NP-complete problems such as Vertex Cover, 
Vertex Cover-B, Max Sat, Max c-Sat, Max k-Cut, and Independent 
Set for bounded degree graphs are parameterized tractable. In particular, each 
of these problems can be solved in time 0{2^^^^p{n)) for some polynomial p. 
It is natural to ask whether the running times of parameterized algorithms for 
these problems can be significantly improved. In this section, we work towards 
answering these questions through an investigation of parameterized versions of 
MAX SNP-hard optimization problems. 

The class MAX SNP was introduced by Papadimitriou and Yannakakis jTR] 
to study the approximability of optimization problems. As defined in m , an 
optimization problem 77 is in the syntactic class MAX SNPq if its optimal 
cost OPTn{I) for each instance 7 can be expressed as OPTn{I) = maxs |{u : 
^(u,7. S')}!, where both the instance 7 and the solution S are described as fi- 
nite structures. The class MAX SNP contains all optimization problems that 
can be reduced to some problem in the class MAX SNPq through the following 
approximation-preserving reduction . 

Definition 3. m Let LIi and LI 2 be two optimization problems with cost func- 
tions fi and f 2 . Ill L-reduces to II 2 if there are two polynomial time algorithms 
A and B and two constants a,/3 > 0 such that for each instance Ii of IIi, 
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i. ) the algorithm A produces an instance I 2 = A{I) such that OPT[j 2 {l 2 ) < 

aOPTn^(Ii) , and 

ii. ) given any solution S 2 for I 2 with cost /2(^2,5'2), algorithm B produces a 

solution Si for Ii with cost fi{Ii,Si) such that \OPT[j^{Ii) — <S'i)| < 

!3\OPTn,{l2)- f2{h,S2)\. 

It is known from the work of Cai and Chen 0 that the standard param- 
eterized versions of all maximization problems in the MAX SNP are parame- 
terized tractable. The proof of this earlier result shows that L reductions pre- 
serve parameterized tractability. Here we provide a more detailed account of how 
L reductions preserve parameterized tractability among the standard parame- 
terized versions of optimization problems. In particular, it can be shown that 
L-reductions preserve subexponential-time computability. 

Lemma 2. Let 77i and II 2 two optimization problems such that IIi L-reduccs 
to II 2 , and assume that the cost function for II 2 is integer-valued. If is 

solvable with witness in time 0{f{k)p{n)) for some recursive function f and 
polynomial p then can be solved in time 0{kf{0{k))q{n)) for some q 

polynomial. 

Because Max 3-SAT^°d) [siiti| is parameterized tractable with witness, we 
obtain the following result through Lemma |21 

Corollary 1. The standard parameterized version of each optimization problem 
in the class MAX SNP is solvable in time p{n)) for some polynomial p. 

Lemma 121 allows us to give a natural connection between the parameterized 
complexity of MAX SNP-hard problems and the parameterized complexity of 
problems in MAX SNP. 

Theorem 3. Let 7Ti be a MAX SSlP-hard (under L-reductions) optimization 
problem with an integer-valued cost function. If is solvable with witness 

in time p{n)) for some polynomial p, then for any optimization problem 

7 T 2 in MAX SNP, ilf is solvable in time q{n)) for some polynomials 

q- 



To show that similar results hold for the extended parameterized versions of 
certain problems, we use the following technique that bridges the gap between 
the standard and extended parameterized versions of optimization problems. 

Lemma 3. Let c > 0 be any constant integer, let r' > ^ be an any rational 
number, and define r{(f>) = r' -m, where m is the number of clauses in the boolean 
formula (f. If the standard parameterized problem Max C-Sat is solvable in time 
0{f{k)p{n)) for some polynomial p, then the extended parameterized problem 
Max C-SAT^’’d) solvable in time 0{f{llk-\- 8)q{n)), for some polynomial q. 

Proof. Assume that there is an algorithm A solving the parameterized problem 
Max C-Sat as stated. We describe an algorithm B that solves Max C-Sat^’’’^^ 
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by calling the algorithm A. The algorithm B uses the approach found in Propo- 
sition 8 and Theorem 9 of Mahajan and Raman CHI- 

Let F be a set of clauses, and let (F, k) be a given instance for Max C- 
The algorithm B operates as follows. 

Input {F, k). 

Let U be the set of unit clauses in F. 

(1) While U contains clauses of the form {x) and (-•a;), 

remove both clauses and reduce fc by 1. 

(2) If |C/| > r'm + k return “YES” 

(3) If ^ -h -l>r'm + k return “YES” 

(4) Otherwise, call algorithm A on input (F, r'm + k) and return 
its answer. 



To see that this algorithm correctly solves Max C-Sat^”’^^ on input (F, fc), first 
consider the set U of unit clauses in F. If U contains two clauses of the form 
(x) and (“ix), both can be removed since any truth assignment of x satisfies 
exactly one of these two clauses. In this case, the value k can be reduced by 1. 
If U contains no such pair of clauses, then all the clauses in U can be satisfied 
simultaneously. Hence, if |ff| > r'm + k, there is an assignment to the variable 
satisfying at least r'm + k clauses and the algorithm B correctly answers yes. 
Furthermore, by Proposition 8 of CHI, there exists a satisfying assignment of F 
that satisfies at least |"yl -I- — 1 clauses in F. Hence, if ^ — 1 > 

r'm + k, at least r'm + k clauses of F can be satisfied simultaneously. So, the 
algorithm B also answers correctly in this case. In all other cases, algorithm B 
calls algorithm A on input (F, r'm + k) . Since algorithm A correctly solves Max 
C-Sat^°’^\ algorithm B is also correct. 

It is easy to see that steps (I)-(3) of algorithm B can be performed in a 
polynomial number of steps. Step (4) involves a call to algorithm A on input 
(F, r'm + k). Since y -I- — 1 < r'm+k, we have that |F — F| < 4(r — ^)m + 

4:k -I- 4. Since \U\ < r'm + k, we have that m = \U\ + \F — U\ < rm + k + 4(r — 

-I- 4fc -I- 4. Therefore, we have that the number of clauses in F is bounded a 
linear function of /Q, i.e.. 



5k + 
3-5r 



Because r' > it follows that m < 10/c -I- 8 and k' = r'm + k < lOr'k + 8r' -|- k. 
Substituting this new value of k into the running time of the algorithm A gives 
the required running time for the algorithm B. 



Lemma El immediately gives the following theorem. 

Theorem 4. Let c > 0 be any eonstant integer, let r' > ^ be an any rational 
number, and define r{(f) = r' ■ m, where m is the number of clauses in the 
boolean formula <j). If the parameterized problem Max C-Sat^*^’^^ is solvable in 
time p{n)) for some p polynomial, then Max C-Sat^”’^^ is solvable in 

time q{n)) , for some polynomial q. □ 



^ This also suggests that step (4) will only be executed when r < |. 
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4 Parameterized Complexity of MAX SNP-Hard 
Problems 

In contrast to our results in the previous section, we now show that for some 
optimization problem 77, the parameterized version 77^’'’^°s") may not be pa- 
rameterized tractable. 

Theorem 5. Let r' he a rational constant such that | < r' < 1 and define 
r{4>) = r'm, where m is the number of clauses in (f. Then for any natural number 
c > 3, Max is W[l]-hard. 

Proof. It suffices to show that the IT[I]-hard problem Independent Set can 
be transformed to Max c-Sat^’’’^°®"^ through a standard parameterized m re- 
duction. 

The parameterized problem Independent Set is defined as follows. Given 
a graph G = (V, E) of n nodes and an integer k, determine whether there is a 
subset V of V of size fc in G such that no two vertices in V' are connected by an 
edge in G. We describe a process to transform (G, k) to an instance for problem 
Max c-Sat^’'’^°s”) for some r' > ^. The reduction consists of the following five 
steps. 

Step 1. Construct an anti-monotonic Boolean circuit Gi from G as fol- 
lows. Let V = {v\, - ■ ■ ,Vn}. The circuit Gi consists of n input variables 
X = (xi, • • • , Xn), an AND gate as the output, and \E\ intermediate OR gates, 
each of which has the output wired to an input of the AND gate. For each edge 
e = {vi, Vj) in G, an OR gate ge is constructed with two inputs ->Xi and ~<Xj. By 
associating a setting of the variables x\, . . . ,x„ with a subset of V in the natural 
way, it is straightforward to verify that Gi has a satisfying assignment of weight 
k if and only if G has an independent set of size k. 

Step 2. Convert the circuit Gi into another anti-monotonic circuit G 2 that 
has nk input variables. These nk variables are organized into k blocks of n. 
Let Y = {y^^\- ■ ■yn'’)r ■ {y['"\- ■ ■ Vn^)] be this set of input 

variables. As with Gi, we have an AND gate as output. In addition, the circuit 
C 2 contains three sets of OR gates, Ei, E 2 , and E^, defined as follows. 

^1 = { V -^y^p) : l<t<k,l<i < j <n}, 

7^2 = { {~^vP ^ ^uP) '■ for each gate -^XiM-^Xj G Gi, 1 < 7, j < n, 1 < s, t < k}, 

and 7^3 = { {~^yP V ~'yP) ■ I < s < t < k,l < i < n}. 

Each of the OR gates has an output wired to an input of the AND gate. 

Notice that the three sets of OR gates enforce specific conditions. The set Ei 
enforces the condition that no more than one variable in each block of n can be 
set to true. The set E^ enforces the condition that no more than one variable in 
position j (yP) of some block is set to true. Sets E\ and E 3 force any satisfying 
assignment to contain at most k variable that are set to true, with at most one 
coming from each block of n and each position j. Intuitively, the 7th variable in 
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a block being set to true will correspond to the vertex i being in an independent 
set V. 

It is easy to show that C 2 has a weight k satisfying assignment if and only if 
Cl has a weight k satisfying assignment. 

Step 3. Transform the anti-monotonic circuit C 2 into a monotonic circuit C3 
that contains 2k log n input variables, organized into k blocks of size 2 log n. Let 
Z = , • • • , be the k blocks of input variables, where for each t = 1, • • • , fc, 

V\ogn) ^ vector of 21ogn variables. In this construc- 
tion, corresponds to the tth block in Y . This requires some explanation. 

A desired assignment to Y in C 2 will have exactly one variable assigned to 
true in each block. For each block t of Y, the variable that is assigned to true, 
say yl^\ can be specified by its position i within the block. When I < i < n, 
the position i can be encoded by a binary number B of length log n. Let B = 
■ ■ ■ &iQg„ with each G {0> !}• In each bit bf~^ is encoded by a pair of 
variables uf\vf\ For I = 1, • • • , fc, 

(1) = 1 if and only if = 1 and = 0, and 

(2) = 0 if and only if = 0 and = 1. 

Notice that each input variable in can be represented by an AND of 
input variables from Z. Let gate gf''^ in C 3 represent the input variable yf^ in 

F, where gf'^ = f\ with wf'^ = if = 1 in B = i and 

if = 0 in i? = i. Since C 2 is an anti-monotonic circuit, we only need to 
represent the negation of an input variable yf'^ in Y . For this purpose, we use 

gf'^ = -•gf'^ = V where if bi(t) = Q in B = i and if 

i=i 

bi{t) = 1 in B = L It is not hard to verify the correctness of this representation. 

Continuing the construction, each OR gate -'y\‘^^ V ~'yf'’ in circuit C 2 is repre- 
sented by a gate V g^p . This is an OR of 2 log n variables in Z. Additionally, 

we need to guarantee that each pair of input variables uP , vP always take ex- 
clusive values in a desired assignment. To enforce this, we introduce a set H of 
gates, where H = { hP = uP V vP : / = 1, • • • , log n,t = The gates 

in H force at least one variable from each pair uP , vP to evaluate to true. Since 
there are exactly 2 k log n variables, a weight k log n satisfying assignment causes 
exactly one of uP , vP to be true. 

It is straightforward to show that C3 has a weight k log n satisfying assign- 
ment if and only if C 2 has a weight k satisfying assignment. 

Step 4. Reformulate the weighted satisfiability problem for the monotone 
circuit Cs into a parameterized Max C-Sat(’’’*°s") problem. From step 3, is 
a monotonic circuit with 2fclogn input variables that is an AND of ORs. Note 
that all of the OR gates in C 3 either have fan-in 2 or fan-in s = 2 log n. We now 
build a boolean formula in CNF for C3. 
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Define 



Fi = {Cg = . gate g=\J is in C3 }, 

i=l 

F2 = {(w) : le is an input variable of C3}, and 

H' = {(m, v), (-1U, -<v) : u, V are two paired input variables of C3 }. 

Furthermore, let F3 be the set containing (2fclogn + 1) copies of each clause in 
H'. Let F = U F2 U F3. Note that IF2I = 2fclogn. 

If C3 has a weight k log n satisfying assignment, then there is an assignment 
of the variables of F that satisfies |Fi| + IF3I + klogn clauses in F. Similarly, if 
F has an assignment that satisfies |Fi| + IF3I + klogn, then all the clauses in F3 
must evaluate to true. If not, then at least 2fclogn+ 1 clauses in F3 evaluate to 
false. This is a contradiction. Moreover, if all the clauses in F3 evaluate to true, 
then exactly fclogn variables are set to true. Hence, all the clauses in Fi must 
evaluate to true. Therefore, C3 has a weight fclogn satisfying assignment. 

To complete the conversion to a formula in c-CNF, it suffices to convert 
all the large ORs in Fi to clauses of size c. This can be done using additional 
variables as in the standard reduction from SAT to 3-SAT |3 p.438]. If these 
new clauses are placed into Fi, then, as verified in the previous paragraph, C3 
has a weight k log n satisfying assignment if and only if F has an assignment 
satisfying |Fi| + IF3I + fclogn clauses. 

Step 5. Let N = |Fi| + IF3I. Note that the total number of clauses in F 
is + 2fc log n, where n is the number of vertices in the original graph G. We 
next pad some new unit clauses into F so that there exists an assignment to the 
variables satisfying r'm + k log n clauses if and only if G has an independent set 
of size k, where m is the number of clauses. 

Now, add M new variables and add one unit clause to F for each new variable 
and its negation. This new formula F has m = 2 M + N + 2 k log n clauses, and 
there exists an assignment to the variables satisfying M + N + k log n clauses if 
and only if G has an independent set of size k. It suffices to show that we can 
pick a value for M such that r'm + k log n = M + N + k log n. 

The appropriate value for M must satisfy r' = (M + fclogn + N)/{ 2 M + 
2fclogn + N). We can rewrite this as r' = 1 — 2(M+fctog n"+Af ’ hence M = 

Because N » fclogn, such an M exists for any | < r' < 1. 
Moreover, we can compute M from r', N, fc, and log n and produce the correct 
number of unit clauses. 

An adjustment must be made to the factor logn. It can be verified that N is 
a polynomial in n. Moreover, M is linear in N . So logm = O(logiV) = O(logn). 
Therefore, we can add 2<ifclogn unit clauses to F, for some constant d, so that 
exactly rm + fc log m clauses can be satisfied. This completes the reduction. 

Finally, note that the reduction takes an instance (G, fc) of Independent 
Set and produces an instance (F, fc) of Max C-Sat*^’'’^°®"^. Since this is a pa- 
rameterized m-reduction, it follows that Max C-Sat*^’'’*°®") is kF[l]-hard. 
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Corollary 2. Let r' be a rational eonstant sueh that | < r' < 1, and define 
r{4>) = r'm, where m is the number of clauses in ip. The problem Max 
is W[l]-hard. 

Theorem El completes the technical results leading up to our main result. 

Theorem 6. Let LI be a MAX SNP-Ziord optimization problem with an integer- 
valued cost function. The standard parameterized version of II cannot be solved 
with witness in time 0{2°^^'>p{n)) for any polynomial p{n) unless W[l] = FPT. 

Proof. Assume that for some MAX SNP-hard optimization problem LI, its stan- 
dard parameterized version is solvable with witness in time p{n)) 

for some polynomial p{n). Then by Theorem |3 Max C-Sat^°’^^ is solvable in 
time for some polynomial q{n). By Theorem^ a 0(k2°^^\{n)) al- 

gorithm exists for Max C-Sat^’"’^) for any r. By Theorem|21 Max c-Sat(’’’'°®") 
is parameterized tractable. Together with TheoremEl this implies W[l] = FPT. 

Since the proof of Theorem 0 relies on Theorem 0 it does not appear that 
we can easily remove the word “witness” from the statement of our main re- 
sult. However, in practice, it is often the case that the complexities of decision 
problems and their witness versions are closely related. In the case of Vertex 
Cover, it is easy to show that Vertex Cover is solvable in time 0(2°^^ip(n)) 
if and only if it is solvable with witness in time 0{2°^^\{n)) for polynomials p 
and q. Hence, Theorem Ogives the following immediate corollary. 

Corollary 3. The parameterized problems Max Sat, Max c-Sat, Vertex 
Cover, Vertex Cover-B, Independent Set-B, Dominating Set-B, and 
Max C-Cut cannot be solved in time 0{2°^^^ p{n)) for any polynomial p{n) 
unless W[l] = FPT. □. 

5 Conclusion 

Our main results provide a simple framework for proving strong lower bounds on 
the parameterized complexity of problems within FPT. To achieve a 2°^^^p{n) 
lower bound, it suffices to prove that a problem is MAX SNP-hard and that the 
witness version nicely reduces to the decision version. As mentioned by Bellare 
and Goldwasser [Oj, it is well-known that search is polynomial-time Turing re- 
ducible to decision for every NP-complete problem. To obtain Corollary El we 
require a more restrictive notion of reducibility between the witness and decision 
versions of parameterized problems. In particular, we require that the reduction 
between the witness and decision version does not greatly increase the value of 
the parameter fc. It is not immediately obvious that search reduces to decision 
for every NP-complete problem when this requirement is added. Nevertheless, it 
is the case that the witness version reduces to the decision version in this way 
for many NP-complete problems, such as those mentioned in Corollary El 

More generally, our techniques provide a framework for relating the com- 
plexities of various parameterizations of the same problem. We believe that this 
framework may lead to lower bounds on non MAX SNP-hard problems as well. 
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Abstract. We prove a lower bound of log'^'^® n) on the ran- 

domized decision tree complexity of any nontrivial monotone n-vertex 
bipartite graph property, thereby improving the previous bound of 
due to Hajnal [H91| . Our proof works by improving a proba- 
bilistic argument in that paper, which also improves a graph packing 
lemma proved there. By a result of Groger our complexity lower 

bound carries over from bipartite to general monotone n-vertex graph 
properties. Graph packing being a well-studied subject in its own right, 
our improved packing lemma and the probabilistic technique used to 
prove it, may be of independent interest. 

Keywords: Decision tree complexity, monotone graph properties, ran- 
domized complexity, randomized algorithms, graph packing, probabilistic 
method. 



1 Introduction 

Consider the problem of deciding whether or not a given input graph G has a 
certain (isomorphism invariant) property P. The graph is given by an oracle 
which answers queries of the form “is {x, y) an edge of G?” A decision tree 
algorithm for P makes a sequence of such queries to the oracle, where each 
query may depend upon the information obtained from the previous ones, until 
sufficient information about G has been obtained to decide whether or not P 
holds for G, whereupon it either accepts or rejects. Let Ap denote the set of 
decision tree algorithms for P and for A G Ap, let cost (A, G) denote the number 
of queries that A asks on input G. The quantity C{P) = min^i maxc cost(A, G) 
is called the deterministic decision tree complexity, or simply the deterministic 
complexity of P. 

A randomized decision tree algorithm for P is a probability distribution T> 
over Ap, and its cost (on input G) is the expectation of cost(A, G) with A drawn 
from T>: 

cost^(P, G) = ^ Pr[A] cost(A, G) . 

AeAp 

* This work was supported in part by NSF Grant GGR-96-23768, ARO Grant 
DAAH04-96-1-0181, and NEG Research Institute. 
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The randomized decision tree complexity, or simply the randomized complexity, 
of P is defined to be 



C^iP) = minmaxcost^(T>, G) . 

D G 

An n-vertex graph property is said to be nontrivial if there is at least one 
n- vertex graph which has the property and at least one which does not. It is 
said to be monotone if addition of edges does not destroy the property. Let Vn 
denote the set of all nontrivial monotone n-vertex graph properties. We shall 
need to consider the analogously defined notion of randomized complexity for 
bipartite graph properties; let Pn,n denote the set of all nontrivial monotone 
(n, n)-bipartite graph properties. 

Clearly any n-vertex graph property P satisfies C^{P) < C{P) < ( 2 ) = 
0{n^). A classic result of Ri vest and Vuillemin mm shows that any P G Pn bas 
C{P) = n(n^), which settles the deterministic complexity of monotone properties 
up to a constant^ We remark that monotonicity is crucial for this result; there 
are examples of nontrivial non-monotone graph properties with C{P) = 0{n). 

The situation is far less satisfactory for randomized complexity. The first non- 
linear lower bound on C^{P), for general P G Pn, was an n(nlog^^^^n) bound 
proven by Yao This was subsequently improved by Valerie King [K88| to 

and later by Hajnal | |H91| to The only other significant work 

in the area is due to Groger insa who established lower bounds stronger than 
Hajnal’s for certain special classes of graph properties. 

No property in P^ is known to have randomized complexity below n^/4. 
Closing this gap between Hajnal’s lower bound and this upper bound is one of the 
most important open problems concerning the complexity of graph properties. 
It is remarkable that this quarter-century-old problem has yielded so few results. 
In this paper we take a small step by proving 

Theorem 1.1 (Main Theorem). Any property P G Pn satisfies C^{P) = 

f](„4/3iogl/3„) g 

Our proof will rely on an important theorem from the pioneering work of 
Yao |Y87| . as well as on a framework developed by Hajnal ESI. In this frame- 
work we associate with a graph property a special pair of graphs which cannot 
be “packed” together. We then argue that if the property has low randomized 
complexity, then certain degree upper bounds can be proven for these special 
graphs. Finally, we use these degree bounds to prove that the special graphs can 
be “packed”, thereby arriving at a contradiction. 

The notion of graph packing, which we shall formally define later, is a well- 
studied subject in its own right EzHl- A packing lemma Hjemma, 12.811 which we 

^ However, in the world of deterministic complexity, a far more interesting conjecture 
is that any P G Pn has C(P) = ( 2 ) exactly. Remarkably, this conjecture remains 
open to this day. 

^ Throughout this paper log a; denotes the logarithm of x to the base 2. The natural 
logarithm of x is denoted by Inx. 
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establish in this paper is therefore of independent interest since it improves a 
packing theorem due to Hajnal and Szegedy Essg. 

The rest of the paper is organized as follows: in Section 0, we define some 
preliminary notions, describe the framework alluded to above and prove The- 
orem n~n assuming our graph packing lemma. In Section 0 we state and prove 
a technical lemma which is then used in Section 0to prove the packing lemma 
we need, thereby completing the proof of Theorem 1 1 . II We conclude with some 
remarks in Section 0 



2 Preliminaries and Proof Outline 



The first important step is to change the objects of study from graph properties 
to bipartite graph properties. A result of Groger |G92| lets us do just that. 

Theorem 2.1. Let f{n) be a function satisfying f{n) = and suppose 

any P € Vn n satisfies C^{P) = Ll(f(n)). Then any Q G Vn satisfies C^{Q) = 

Proof. This is a simple restatement of Theorem 3.5 of innn- □ 

For the purposes of proving a lower bound of log^^^ n), we may there- 

fore safely concentrate on monotone bipartite graph properties alone. We now 
need some definitions. 



Definition 2.2 (Basic definitions). An {m,n) -bipartite graph G is a graph 
whose vertices can be partitioned into two independent sets, denoted Vl{G) and 
Vr{G) respectively, of sizes m and n respectively. The edge set of G is denoted 
E{G). For such a graph we define 



Al{G)= inax degG(u) , 

veVL(G) 



Sl{G) = 



1 

W^\ 



veVL(G) 



\E{G)\ 

\Vl{G)\ • 



Ar{G) and Sr{G) are defined similarly. When \Vr{G)\ = |yR(G)| we define 
S{G) = Sl{G) = 6r{G). We define G to be the (in, n) -bipartite graph with the 
same bipartition and with edge set Vl(G) x Vr(G) — E(G). 



Definition 2.3 (Sparseness). The bipartite graph G is said to be L-sparse if 
Vl(G) contains at least i|Vi(G)| isolated vertices, i.e. vertices of degree 0. The 
notion of i?-sparseness is defined analogously. 

Let P G Vn,n- An (n, n)-bipartite graph G is called a minterm of P if G 
satisfies P but removing any edge from G yields a graph which does not. Suppose 
we associate with G an n-tuple (di,d 2 , . . . ,dn) with di > . . . > where the 
di are the degrees of the vertices in Vl(G); we then say that G is an L-first 
minterm of P if it is a minterm and its associated n-tuple is lexicographically 
smallest amongst all minterms. We say that G is an L-first sparse minterm of 
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P if it is a minterm, is L-sparse and its associated n-tuple is lexicographically 
smallest amongst all L-sparse minterms. We define R- first minterms and R- first 
sparse minterms analogously. 

Finally, we define the dual of a property P S Vn,n to be the property P* G 
Pn,n such that a graph G satisfies P* iff G does not satisfy P. 

Lemma 2.4. For any P G Pn,n either P or P* has an R-sparse minterm. 

Proof. Let G be an edge-max;imal i?-sparse (n, n)-bipartite graph. Then G is 
isomorphic to G; therefore G must satisfy either P or P* . □ 

It is easy to see that any decision tree algorithm for P can be converted into 
one for P*; this gives C^{P) — C^{P*). Therefore from now on we shall assume 
WLOG that G has an i?-sparse minterm. The next theorem summarizes the key 
result of Yao f7R7] and an extension of the result due to Hajnal 

Theorem 2.5 niY87 iH9in . For P G Pn,n, the following hold 

( 1 ) If G is a minterm of P then C^{P) = fl{\E(G)\). 

( 2 ) If G is either an L-first minterm or an L-first sparse minterm, then 

C^{P) = n{nAL{G)/SL{G)) , 

and a similar statement holds for R-first minterms and R-first sparse minterms. 

□ 



2.1 Graph Packing 

We now introduce the key graph theoretic concept which we shall need. Let us 
say that graphs G and H ean be paeked if there is a way to identify their vertices 
without identifying any edge of G with an edge of H . Such an identification, 
when it exists, shall be called a packing of G and H. To see the relevance of this 
concept, consider the case when G and H are minterms of P and P* respectively, 
for some property P G Pn,n- To say that G and H can be packed is equivalent to 
saying that G is isomorphic to a aubgraph of H. Now from monotonicity and the 
definition of dual properties one can see that this rives rise to a contradiction. 
These ideas are formalized in the next definitioiu and the following theorem. 

Definition 2.6. Let G and H be {m,n) -bipartite graphs. A packing of G and H 
is a pair of bijections tpL ■ Vl{G) — >■ Vl{H) and tpR : Vfy(G) — )> Vr{H) such that 
for any x G Vl(G) and y G Vr{G), either (x,y) ^ E{G) or {ip l{x) , ip R{y)) ^ 
E(H). We say that G and H can be packed if there exists such a packing. 



Theorem 2.7 ( [Y87] ). For P G Vn,n, let G be a minterm of P and H be a 
minterm of P* . Then G and H cannot be packed. □ 

® We have defined the notion of packing only for bipartite graphs here because that is 
all we need. In the literature, packing has been stndied both for general graphs as 
well as bipartite graphs. 
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2.2 Outline of the Proof of the Main Theorem 

We are now ready to outline the proof of Theorem II . 11 Let P G Pn,n and let 
q = q{n) be a parameter to be fixed later. We wish to prove that C^{P) = ^l{nq). 
Suppose this is not the case. Let G be an i?- first sparse minterm of P and H be 
an L-first minterm of P*. By part (1) of Theorem 12.51 the following conditions 
hold: 

(5(G) < q , S{H) < q . 

Using these in part (2) of Theorem 12.51 gives us the following additional condi- 
tions: 

Afl(G) < q^ , Ai(iL) < q^ . 

What we would like to show is that for an appropriate choice of q, these condi- 
tions imply that G and H can be packed. Then by Theorem IT7I we would have 
a contradiction. 

The above framework is the same as that used by Hajnal [um] . Our im- 
provement is in the parameters of the packing lemma. Our improved lemma 
says: 

Lemma 2.8 (Packing Lemma). Set q = {^enlogny^^ . Let G and H be 
{n,n) -bipartite graphs with (5(G) < q, S{H) < q, A/j(G) < q^ and Ar{H) < q^ . 
Furthermore, suppose G is R-sparse. Then if e is a small enough constant, G 
and H can be packed. 

Remark. This is a stronger result than that of Hajnal and Szegedy this 

makes the lemma interesting on its own. 

As outlined above, proving this lemma will establish that C^{P) = Lt{nq) = 
0(n^/^ log^^^ n) with the above choice of q. This will prove the Main Theo- 
rem 11.11 The rest of the paper will therefore be devoted to proving this lemma. 



3 A Technical Lemma 

The proof of our improved packing lemma will depend on a probabilistic fact: 
specifically, a tail inequality. We shall be interested in considering set systems 
with the property that a small random sample of the ground set is unlikely to 
hit too many of the sets in the system. More precisely, the property is that the 
number of sets missed by the sample is only a constant factor below the expected 
number, with high probability. The purpose of this section is to establish that 
certain simple conditions, if satisfied by the set system, guarantee this type of 
property. As it turns out, it suffices to upper bound the size of each set, and 
the maximum and average number of sets containing each element of the ground 
set; of course we also need a large enough set system over a large enough ground 
set. We shall call a set system favourable — with the appropriate parameters — 
if it satisfies these conditions. 
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Assumption 3.1 Throughout this section we assume that n is large enough and 
that r(n), s{n) and t(ji) are integer valued functions in o{n) nu;(l). 



Definition 3.2. Let V be a finite set and let iF Q 2^ be a set system on ground 
set V. We say that T is (n, r(n), s(n), s(n))-favourable if 

\V\=n , \T\>n , 

yP gT |J^| < r(n) , 
yv GV \{F G P : V G F}\ < s{n) , and 

- ^ |{F G P : V G i^}| < s(n) . 

^ vev 

Now consider a set system P on ground set V , and a function t{n). Let S be 
a random subset of V of size t(n), chosen uniformly from all subsets of size t(n). 
Our Technical Lemma is concerned with the behaviour of the following random 
variable: 

X{P-t{n)) = \{F GP ■. Ff^S = it)}\ . (1) 



Lemma 3.3 (Technical Lemma). Let P be {n,r{n), s{n),s{n))-favourable. 
Suppose r{n)t{n) < |enlogn, and t{n)s{n)s{n) < for some constant 

e > 0. Then we have 



Pr 



X{P;t{n)) < -n^ 



< 



Example. It may help to first think about a concrete example of a favourable 
set system and what the lemma says about it. Consider the ground set V = 
{1,2, ...,n} and let P be the collection of all n intervals of size (say) 

with wrap-around (i.e. n and 1 are consecutive). Every interval is of size 
and each point of the ground set belongs to intervals. Therefore this P 

is (n, 6n^/‘*)-favourable. A straightforward calculation shows that a 

random subset of V of size (say) is expected to be disjoint from fl(n) of 

these intervals. From the above lemma we can conclude, in particular, that it is 
disjoint from n(n° ®®) intervals, with “high” probability. 



In order to prove I^emma, 13.31 we define another random variable Y which 
“behaves like” X{P\t{n)) and is easier to handle. Let us number the elements 
of V from 1 to n and set p = 2t{n)/n. Let Zi , . . . , be i.i.d. boolean random 
variables with Pr[Zi = 1] = p. Let S" C P be the random subset given by 
S' = {i : Zi = 1} and define 

Y =\{F G P : FnS' = $}\ . (2) 



The next lemma connects Y with X. 
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Lemma 3.4. For any a we have Vr\X{T\t{n)) < a] < 2 • Pr[F < a]. 

Proof. We proceed as in IH91I . For 0 < k < n let TTk = Pr[X k) < a]. Observe 
that ttq < 7Ti < • • • < TTn- Let A = [^np\ and B = [fnpj . We have 

Pr[y < a] = ^ ^ ~ pT~^ > ■ 

Noting that A = t{n) completes the proof. □ 



Lemma 3.5. Under the hypotheses of the Technical Lemma, E\Y] > 

Proof We have E[Y] = ^ ''^(1 — using the fact that T is 

(n, r(n), s(n), s(n))-favourable. For any constant a > 1, we have (1— > e““ 
for large enough n. Therefore 

E[Y] > n ({1 - p^/P^ ^ 

since by hypothesis t{n)r{n) < ^enlogn. Choosing a = 2 In 2 completes the 
proof. □ 

Now that we know that Y has high expectation, the main task is to prove 
that it does not fall far below its expectation too often. To this end we would 
like to consider an exposure martingale corresponding to Y that is obtained by 
revealing the values of Zi one at a time. For z = 0, 1, . . . , n, we define random 
variables Yi = Yi{Zi, . . . ,Zi): 

Y,{zi, ...,Zi)= E[Y\Z^ = zi,...,Z, = z,] , Vzi, . . . , z, G {0, 1} (3) 

where the expectation is taken over . . . , It is clear that 

Fi_i(zi, . . . ,Zi_i) = (1 -p)y*(zi,...,Zi_i,0) +pPi(zi,...,Zi_i,l). Therefore, 
defining another set of random variables Di = Di{Z \, . . . , Zi-\) by 

Di{zi , . . . , Zi—i) := Yi(^zi , . . . , Zi—i, 0) li(zi, . . . , Zi—i, 1) , 

Vzi,...,z^_i G {0,1} 

gives 

( Y,_i{zi,. . . ,z,_i) + pDi{zi,. . . ,z,_i) , if Zi = 0 

= r,_i + (p(i - z,) - (1 - p)z,) A 

= Yi-i + (p — Zi)Di , 



Y = E[Y] + Y.^P-Z,)D, . 
2 = 1 



whence 



( 5 ) 
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To bound the random variables Di, note that for any fixed z\, . . . , Zi-i, the 
quantity Di is a convex combination of the quantities 

, . . . , Zi—i, 0, Zj-i-i , . . . , Zj2 ) ^n) (fi) 

fixed variable 

where (zi+i , . . . , Zn) varies over all tuples in {0, From the definition of Y 
in 0, it is clear that each of the quantities (0 lies between 0 and di where 

d, :=|{Fe^: ieF}\ . 

Therefore 0 < Di < di. 



Lemma 3.6. With Zi,...,Zn as above, let ci,...,Cn be positive integers, let 
Ci = Ci{Zi, . . . , Zi-i) be real functions satisfying 0 < Ci < Ci and let X > 0 be 
an arbitrary real. Define A = max"^j^ Ci and 6 = max{A^, ^ 'YZi=i ^i}- Then, if 
pm > A we have 



Pr 



n 

'^{p- Zi)Ci < -Xy'pnSA logn 

i=l 



< n 



-A^/6 



Remark. The lemma is interesting because (1) we are summing dependent random 
variables and (2) it is a martingale inequality that does not follow from Azuma’s 
inequality. 



Proof. To simplify the proof, we assume that the Cfs are integer- valued; this as- 
sumption can easily be removed by discretizing with denominator and rescal- 
ing. For 1 < i < n, 1 < j < di, define random variables Wij as follows 



W2J = 



j P- Z2 

\0 > 



ifj<0 

ifj>0 



The key observation is that the nonzero Wij’s for distinct i are independent, 
because the Zfs are. Therefore, if we set i = nS/A, the Wij’s can be arranged 
in an ^ X A matrix such that the nonzero random variables in each column 
are independent 0 Fix a column and discard the zero entries; suppose m entries 
remain and sum to S. Standard Chernoff bounds (e.g. see fASFnq . Theorems 
A. 11 and A. 12) imply that for any a > 0, 6 > 1: 

Pr|S <-<■]< exp (- , end (7) 

Pr[S' < —{b— l)pm] < [e^~^b~^Y'^ ■ (8) 

Set a = Xy/pllogn and 6 = 1-1- a/ pm. Suppose a < ‘^pm. From Q we immedi- 
ately obtain 

Pr[S' < — A-\/p^logn] < exp(— ^A^ log^ n) < exp(— ^A^logn) . 

We pad the matrix with extra zero entries, if necessary, to end up with the required 
shape of £ X A. 



4 
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Next suppose a > ^pm. Since the real function f{x) = (a; — 1) ^xlnx is increas- 
ing for a; > 1, from Q we get 

Pr[5' < -X^/phogn] < exp (/(5/3)-l)logn^ < exp(-^A^ logn) , 

where the last inequality from the facts that pn > A and that S > X^. Thus, in 
each case we have Pr[5' < —X^/pllogn] < n~^ 

To finish the proof we simply note that Zi)Ci is the sum of A such 

random variables S. □ 

We are now ready to prove our technical lemma. 



Proof, (of Lemma ELiil) Let us apply Lemma 13 .til with Ci = di and Ci = Di. 
This choice of parameters gives A < s(n) and 5 < s{n). Because of the way we 
defined Y in 0, increasing p can only increase the quantity Pr[P < a], for any 
a; thus we may safely assume than pn> A. Recalling that p = 2t{n)/n we get 



Pr 



Y < E[Y] — X^/2t{n)s{n)s{n) 



< n 



-Y/6 



Recall that by hypothesis t{n)s{n)s{n) < n? . Using T;emmas 13.41 a.nd 13.51 we 
then get 



Pr 



X < n} ® — X\/2n?~^ 



< 



2n-^ . 



Setting A > \/T2 yields Pr[A < < n ^ as desired, when n is large enough. 

□ 



4 Proof of the Packing Lemma 



We now return to proving our improved packing lemma (Lemma I2.!SI1 . Recall 
that from the hypotheses we already have the following degree conditions on the 
bipartite graphs G and H we wish to pack: 

5(G) < q , 6{H) < q , Afl(G) < q^ , Al{H) < q^ , (9) 



where we have set 




( 10 ) 



where e is a small constant to be fixed later. We shall assume throughout this 
section that n is large enough. 



Definition 4.1. For a subset W of the vertex set of a graph and integer k < \W\, 
let M{W) denote the neighbourhood of W. Let top(W, k) and bot(W, k) denote 
the subsets of W consisting of respectively, the k highest and k lowest degree 
vertices in W. For a vertex x, letAf(x) be defined as Af({a;}). 
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Following Hajnal Enn, our first step will be to modify G and H suitably 
so that even stronger degree conditions hold. Let k = mm{n/2,n/4S{H)}. From 
the hypotheses, we know that Vr(G) has at least n/2 isolated vertices; let Vi 
be a set of size n/2 of these. Let Vq = top(VL(G), fc), V 2 = hot{VL{H),k) and 
V 3 = A/’(V 2 )Utop(yR(iL), ^ — |A/’(V' 2 )|)- Let us define graphs G' and H' as follows: 

G' = G- {Vo VJVi) ■ H' = H- {V 2 U 1 / 3 ) • 

It follows from the construction above that if G' and iL' can be packed then 
so can G and H. This is because having packed G' and H' we may arbitrarily 
identify the vertices in Vq with those in V 2 and the vertices in Vi with those 
in V 3 . Now, to show that G' and H' can be packed, we shall need the degree 
conditions guaranteed by the following lemma. 

Lemma 4.2. G' and H' are {n — k, n/2) -bipartite graphs with the following 
properties: 

Sl{G') < q , ^fl(G') < q , < q , Sr{H') < q , 

Al(G')<V, AR(G')<q\ 

Al(H') < q^ , Ar{H') < 4g . 

Proof. The first four inequalities are obvious from as are the bounds on 
Ar{G') and Ar{H'). By construction \Af{V 2 )\ < A n/V, there- 

fore V 3 contains at least n/4 of the highest degree vertices in Vr{H). Since these 
vertices are removed to obtain H' we have Ar{H') < 4S{H) < 4q. Similarly, we 
have Al(G') < 4S{H)S{G) <4q^. □ 

We prove that G' and H' can be packed using the probabilistic method: let 
ipL '■ Ll(G') — >■ Vl{H') be a random bijection; we shall show that with positive 
probability there exists a bijection ipR : Vr{G') Vr{H') such that {(Pl,p:r) 
is a packing. Let F = T{(Pr) be a bipartite graph on vertex set (Vr{G),Vr{H)) 
defined as follows: for x S VR{G'),y G Vr(H') we have (x,y) G L1(L) iff 
r\Af{y) = 0. It is clear that the required bijection ipR exists iff the 
graph F has a perfect matching. Our task now is to show that the (random) 
bipartite graph F has a perfect matching with positive probability. The most 
straightforward way of doing this is to obtain lower bounds on the degrees of 
vertices in F and then apply Konig’s Theorem. 

The next two lemmas establish such lower bounds. It is important to note 
that unlike PM] we exploit the asymmetry between G' and H' in a crucial way; 
the degree lower bound for H' is proved along lines similar to uni, whereas for 
G' we need the power of our Technical Lemma. 

Lemma 4.3. With probability greater than for every vertex y G Vr(H') we 
have degp(j/) > f — 89 ^. 
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Proof. Let y G Vr{H') be arbitrary and let S{y) = ip^^{N{y)). Then S{y) is a 
random subset of Vl(G') of size at most = 4g. Since S{G') < q, we have 



E 



degG'(^^) 



i)GS(y) 



< 4?^ . 



This bound on the expectation implies a high probability result proven through 
Chernoff bounds in exactly the same manner as Lemma 5.4 in [IH91 j : we need to 
have Al{G') — 0{n/ logn), but this is indeed the case by Lemma lOl and our 
choice of q in m- Therefore, we get 



Pr 



<iegG'{v) > 8q^ 

- veS{y) 



1 



^ 2n 



Thus Pr[|Af(5'(y))| > 8q^] < ^ and so Pr[degp(y) > f 
\Vr{PP) \ < n, the lemma follows. 



8?^] < Since 
□ 



Lemma 4.4. With probability greater than for every vertex x G Vr(G") we 
have degp(a:) > 

Proof. Fix e = ^. Let x G Fr(G') be arbitrary and let T{x) = (pL{Af{x)). 

Consider the set system H — {Af{y) ■ y G Vr{H')} on ground set By 

Lemma L.2I we see that His ( | , 4g, , q)-favourable and by (till we have Aq-q^ = 

\en\ogn. Now T{x) is a random subset of Vp(iL') of size \T{x)\ < Afl(G') < q^ 
and q^ ■ q^ ■ q = q^ < Therefore, H and T{x) satisfy the hypotheses of the 

Technical Lemma E3 

Applying the Technical Lemma, we see that the number of sets in H that 
T{x) is disjoint from falls below with probability at most ^ In 

other words Pr[degp(a;) < Noting that > 8q^ gives us the 

desired result. □ 

We now have all the pieces we need to prove the packing lemma. 



Proof. (Of the Packing Lemma) From Lemma OI and Lemma lO we see 
that if the bijection is chosen at random, then with positive probability the 
following event occurs: 

Va: G Fr(G') degp(a;) > I - , and ^y G Vr{H') degp(?/) > 8g^ . 

Since F is an (n/2, n/2)-bipartite graph, by Konig’s Theorem, this event is a 
sufficient condition for F to have a perfect matching. Therefore, there exists a 
bijection pR such that F has a perfect matching; thus there exists a packing of 
G' and H' . By the discussion preceding Lemma^2|we see that G and H can be 
packed. □ 
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5 Concluding Remarks 

The complexity of graph properties has been studied for a quarter of a century 
now. Yet the most basic conjecture, namely an f2(n^) randomized decision tree 
complexity lower bound for monotone properties, remains open to this day. As 
mentioned before, the number of results leading towards a settlement of this 
conjecture have been very few. 

Nine years have passed since the best previously known lower bound was 
established. We believe that this makes our result, a slight improvement of the 
bound, significant for injecting new life into this problem. 

To improve the lower bound further it appears necessary to break out of Ha- 
jnal’s framework. Our Technical Lemma is not constrained by this framework — 
instead, Hajnal’s framework constrains the parameters we are forced to apply 
the lemma with — and we hope that it will be useful in further work on this 
problem. 

Acknowledgments. We are grateful to Professor Andrew Yao for introduc- 
ing us to this fascinating problem. We would like to thank Professor Bernard 
Chazelle for several important comments and suggestions. 
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Abstract. We introduce a new imperfect random source that realisti- 
cally generalizes the SV-source of Santha and Vazirani [S2HSI and the 
bit-fixing source of Lichtenstein, Linial and Saks jl .1 ..S80j . Our source is 
expected to generate a known sequence of (possibly dependent) random 
variables (for example, a stream of unbiased random bits). However, the 
realizations/observations of these variables could be imperfect in the fol- 
lowing two ways: (1) inevitably, each of the observations could be slightly 
biased (due to noise, small measurements errors, imperfections of the 
source, etc.), which is characterized by the “statistical noise” parameter 
5 G [0, |], and (2) few of the observations could be completely incorrect 
(due to very poor measurement, improper setup, unlikely but certain 
internal correlations, etc.), which is characterized by the “number of er- 
rors” parameter b > 0. While the SV-source considered only scenario 
(1), and the bit-fixing source — only scenario (2), we believe that our 
combined source is more realistic in modeling the problem of extracting 
quasi-random bits from physical sources. Unfortunately, we show that 
dealing with the combination of scenarios (1) and (2) is dramatically 
more difficult (at least from the point of randomness extraction) than 
dealing with each scenario individually. For example, if bS = w(l), the 
adversary controlling our source can force the outcome of any bit extrac- 
tion procedure to a constant with probability 1 — o(l), irrespective of the 
random variables, their correlation and the number of observations. 

We also apply our source to the question of producing n-player collective 
coin-flipping protocols secure against adaptive adversaries. While the op- 
timal non-adaptive adversarial threshold for such protocols is known to 
be n/2 |BN()()| . the optimal adaptive threshold is conjectured by Ben-Or 
and Linial to be only 0(y/n). We give some evidence towards this 

conjecture by showing that there exists no black-box transformation from 
a non- adaptively secure coin-flipping protocol (with arbitrary conceivable 
parameters) resulting in an adaptively secure protocol tolerating w(y'n) 
faulty players. 



1 Introduction 

Abstract Problem. Consider the following general problem. A sequence of de- 
pendent random variables Xi , X2 , . . . is generated (such a sequence is called 
a stochastic process) . Ideally, each Xi has a known “ideal” conditional distribu- 
tion, based on the outcomes of Ai, . . . , Ai_i (A^’s being independent is a special 

F. Orejas, P.G. Spirakis, and J. van Leeuwen (Eds.): ICALP 2001, LNCS 2076, pp. 297-^^^ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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case). However, the “real” distributions with which the Xi’s are generated could 
be slightly different from the expected “ideal” distribution. Moreover, in the ap- 
plications we discuss later, it is typically the case that the exact nature of these 
“imperfections” is almost impossible to estimate or predict exactly. Therefore, 
we model them as if being caused by an adversary A. On the optimistic side, 
we usually expect the “real” and the “ideal” stochastic process to be somewhat 
“close”. In other words, there are some natural restrictions on the way A can 
influence our stochastic process. The general abstract problem is to determine 
how much “damage” can A cause subject to the restrictions put on it. 

Let us now be more specific. Let N be the length of our stochastic process 
V, and Di = Di{x\ ... 2 :^- 1 ) be the ideal conditional distribution of Xi given 
X\ . . . Xi-i = x\ . . . Xi-\. Now, we will study the effects of two natural imper- 
fections: inevitable (but small!) statistical deviation of each Xi from Di, and rare 
(but complete!) errors in the process. More precisely, our adversary A knows V 
and is given the “noise” parameter i5 G [0, |] and the “error” parameter 6 > 0. 
Then, for any i = 1 . . . N and given x\ . . . Xi-\, A can influence the ideal sample 
of Xi using one of the following rules: 

(A) Fix Xi to any constant in the support of Di. This rule is called an interven- 
tion and can be used at most b times. 

(B) Sample Xi from any distribution Z?' (on the same support set) of statistical 
distance^ at most S from Di. 

We notice that one of the most interesting ideal stochastic processes P is a 
sequence on N independent coin flips. In this important example, A observes 
xi . . . Xi-i G {0, 1}*”^, and can affect the next coin Xi in the following two ways: 

(A) Fix Xi to 0 or 1. Such an intervention can be used at most b times. 

(B) Bias Xi by any value < <5, i.e. set Pr(a;j = 1) anywhere inside ^ -I- < 5]0 

We remark that in most of our applications, S, b and N will be functions of 
some other implicit parameter (clear from the context). In such cases we will use 
asymptotic notation in this implicit parameter (i.e., O(-), 17(-), o(-), tu(-)). 

Motivation. We will show that the abstract setting described above turns out 
to be very relevant in at least the following three areas: (1) imperfect random 
sources; (2) discrete control processes; (3) black-box transformations from stat- 
ically to adaptively secure distributed protocols. While each of the above appli- 
cations will later deserve a separate section, we give a brief introduction now. 

Imperfect Random Sources. A convenient abstraction in the design and analysis 
of various randomized algorithms is that the algorithm in question is given a 

^ Recall that the statistical distance between random variables Z and W over a domain 
A is \\Z — W\\ = I • I Pr(^ = a) — Pt{W = a)|. The same notation stands for 

the distributions generating Z and W. 

^ Recall, a bias of a bit c is indeed defined to be | Pr(c = 1) — ||. 
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stream of completely unbiased and independent random bits. In implementa- 
tions, this stream has to be generated by some physical source of randomness. 
And obviously, such a physical source is unlikely to be “perfect”: the realiza- 
tions/observations of the randomness that it produces are bound to deviate 
from the “ideal” expected distribution (which actually need not be a sequence 
of unbiased random bits, even though the latter in an important special case). 
In particular, the following two “imperfections” are extremely natural: (1) in- 
evitably, each of the observations could be slightly biased (due to noise, small 
measurements errors, etc.), and (2) few of the observations could be completely 
incorrect (due to poor measurement, improper setup, unlikely but certain in- 
ternal correlations, etc.). Our abstract problem perfectly models the situation. 
Inevitable small noise is modeled by the “statistical noise parameter” 6 G [0, 
and the ability of the adversary to apply rule (B) above. Few total errors are 
modeled by the “number of errors” parameter 6 > 0 and the ability of the adver- 
sary to apply rule (A) a limited (at most b) number of times. While each of the 
imperfections alone seems to be insufficient to model a typical physical source, 
their combination seems to be much more realistic. 

We also remark that the main question we address when looking at our 
problem from this angle is that of randomness extraction: can we apply some 
deterministic function to the observed output of our source so as to obtain nearly 
perfect randomness (even a single almost random bit!), despite the malicious 
behavior of the adversary? 

Discrete Control Processes. This application is very similar to the above, except 
it looks at the problem from a different angle. Namely, given some random 
process V, the question asked is how much “influence” over V is needed so as to 
force some desired event S (a function of V's output) to happen. In this sense, 
our adversary A can be seen as a controller trying to minimize the usage of its 
“resources” and still force £ to happen. Again, our problem models the situation 
quite naturally. Rule (A), where A can completely fix the outcome of Xi, is the 
“expensive” resource that A tries to minimize. It also explains while we call rule 
(A) an “intervention”. On the other hand, rule (B), where A can just slightly 
(or not at all if <5 = 0) affect each Xi, can be viewed as something that takes 
“no effort” for A to perform. A good analogy in this scenario could be that rule 
(A) corresponds to a “sharp turn” or “changing highway”, while rule (B) to a 
“casual streering” or “changing lane” . 

While the “rules of the game” are the same as for imperfect random sources, 
the main difference is in the question addressed: given the desired event £, either 
tell the smallest expected number of A’s interventions so as to guarantee £, or 
tell smallest probability of A’s failure for a fixed number of interventions. 

Black-box Transformations. Assume we have a distributed protocol for n players 
to flip a coin or, more generally, to sample some distribution D. As usual, we 
can assume that some number of players (say, b) is malicious, and is controlled 
by a central adversary A. As a result, the malicious players can somewhat bias 
the resulting distribution D' , and we try to design protocols where such bias 
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is guaranteed to be small. A crucial distinction made in the design of such 
protocols, is whether the adversary A is static or adaptive. In the former case 
A has to decide which b players to “corrupt” before the protocol starts, while 
in the latter case it can make these decisions dynamically in the course of the 
execution. It turns out that it is significantly easier to design statically secure 
protocols than their dynamic counterparts. The question we address is whether 
and when it is possible to achieve adaptive security “for free” . More specifically, 
can we transform some good (family of) statically secure protocol(s) 7T so as 
to obtain a reasonable adaptively secure protocol <P (for the same or related 
task)? Moreover, the proof of <P’s adaptive security should only depend on the 
knowledge that U is statically secure and not on any other specifics about II 
(precisely, we will only assume that static A corrupting b players can bias the 
outcome of II by at most 5 G [0, |]). 

We formalize this using the notion of a black-box transformation. Namely, 
we will sequentially run (various protocols in) 77 many (say, TV) times against 
an adaptive adversary A who can dynamically corrupt up to b players. These 
runs were expected to produce outputs Xi . . . Of course, since we run a 
static protocol against an adaptive adversary, some of the X^s might be very 
biased. However, A can corrupt at most b players! Thus, at least (TV — b) of 
the subprotocols were effectively run against a static adversary, and therefore 
produced outputs with the bias at most S. But then, even if the other b runs 
produced A^’s which were completely fixed by A (which actually happens, say, 
in current static coin-flipping protocols), we exactly get our abstract problem! 

The question addressed here is for which setting of parameters such black-box 
transformations are possible. 

Our Results. We will study our abstract problem and show that the adversary 
is quite powerful for essentially any non-trivial setting of parameters. In partic- 
ular, applying our results to the three motivating applications above, we show 
the following. If bS = o-’(I) and independent of the number of samples N: (I) 
no “non-trivial” distribution Y (e.g., a random bit) can be sampled from our 
imperfect source; (2) any “non-constant” event £ can be forced with probability 
1 — o(l) (alternatively, 0{l/5) expected interventions suffices to force £)■, (3) no 
black-box transformation can result in a “non-trivial” adaptive sampling pro- 
tocol. The latter result is extended to show that no black-box transformation 
from any statically secure n-player coin-flipping protocol can result in adaptively 
secure coin-flipping protocol tolerating Lo{y/n) corruptions, giving support to a 
conjecture of Ben-Or and Linial iirmi . 

Organization. While all our results hold for general stochastic processes, the 
special case when 7^ is a stream of unbiased bits will turn out to be quite repre- 
sentative of the general situation, but will substantially simplify the presentation. 
Therefore, we will mainly restrict our attention to the stream of unbiased bits. 

In Section 121 we talk about the “imperfect random source view” on our prob- 
lem. In particular, we completely characterize the (im)possibility of bit extrac- 
tion from our bias-control limited (BCL) source. Next, in Section 0view our 
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source as a discrete control process. We derive tight bounds on how “influential” 
our adversary (or the controller) is in this regard. In Section 0 we have our main 
application to collective coin-flipping: impossibility of black-box transformations 
from statically to good adaptively secure protocols. 

2 Imperfect Random Sources 

Prior Work. Much work has been done on imperfect random sources. Due to 
space constraints, we only survey the relevant to us history of streaming sources. 
Like the ideal source, such sources produce a stream of bits (recall, we will talk 
only about bits for simplicity) incrementally over time, but these bits are not 
necessarily unbiased or independent. Perhaps the first streaming source goes 
all the way back to von Newman who showed how to extract perfect 

random bits from a sequence of independent coin tosses of the same biased coin 
(of unknown bias). Elias fE7^ showed how to improve this result and extract 
perfect random bits at the optimal rate. Blum |B86j relaxed the independence 
requirement on the source by considering streaming sources generated by finite- 
state Markov chains (of unknown structure) . 

The next important development was made by Santha and Vazirani 
who considered a more general streaming source, called a semi-random source (or 
an SV-source). In this source each subsequent bit can be arbitrarily correlated 
with all the previous bits, as long as it has some uncertainty. More specifically, 
the source is specified by the “noise” parameter 5 G [0, |], and can produce any 
sequence Xi,X2, ■ ■ ■ as long as Pr(a;i = 1 \ x\ . . . Xi-\) G [^-S,^-\-5]. This source 
tries to model the fact that physical sources can never produce completely perfect 
bits (anyway, our observation of such sources is bound to introduce some noise) . 
Alternatively, the stream of bits could be produced by a distributive coin-flipping 
protocol IFTlTini . where few malicious players can slightly bias each of the bits. 

In a parallel development, Lichtinstein, Linial and Saks considered 

another streaming source, called the (adaptive) bit-fixing source. In this source 
(characterized by the “number of errors” parameter b) each next bit, depending 
on the previous bits, can be either perfectly random (which is one of the main 
limitations of this source) or completely fixed to 0 or 1. The only constraint is 
that at most b of the bits are fixed. This source tries to model the situation that 
some of the bits generated by a physical source could be determined from the pre- 
vious bits, even though we assume that this does not happen very frequently (at 
most b times) . Alternatively, it relates to the study of “discrete control processes” 
that we mentioned earlier, as well as to the problem of adaptive coin-flipping 
where each player sends at most one bit (see Section ^ . 

Our Source. As we see already, our new streaming source examines the impli- 
cations of having both the problems of “constant small noise” and “rare total 
errors” , naturally generalizing random sources of [IS V86ILLS89| . Interestingly, we 
will show that having both imperfections together is significantly more difficult 
to deal with than having any of them individually, but first we need some nota- 
tion. We call our source (given by S, b, N and a particular adversary A obeying 
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rules (A) and (B)) Bias-Control Limited, or simply {S, b, A^)-BCL source. Notice, 
6=0 (or applying only rule (B)) corresponds to the SV-source, 6 = 0 (or apply- 
ing only rule (A)) yields the bit-fixing source, while 6=6 = 0 gives the perfect 
randomness. Now we can quantitatively measure the “goodness” of our source 
for the problem of bit extraction. 

Definition 1. Let A be some {S,b,N)-BCL source, and f : { 0 , 1 }^ — >■ { 0 , 1 } be 
a bit extraction function. Let 

— q{5, 6 , N, /, A) be the bias of f{x), where x = X\ . . .xm was produced by A. 

— q{S, 6, N, f) = max^ q{5, 6, N, /, A) (taken over all (6, 6, N)-BCL sources A). 

— q{6, 6, N) = miny q{S, 6, N, f) (taken over all f : { 0 , 1 }'^ — >■ { 0 , 1 } ). 

Thus, q(S,b,N) is the smallest bias of a coin that can be extracted from any 
{5,b,N)-BCL source. 

We will say that one can extract an almost perfect bit from a (6, 6, A^)-BCL 
source, if q{S,b,N) = o(l), and a slightly random bit if q{S,b,N) < 5 — f?(l). 
We will now survey the known results about the SV-source and the bit-fixing 
source, and then parallel them with our results. 

Extraction from the Bit-Fixing Source. Recall, the bit-fixing source of 
corresponds to having 6 interventions and 6 = 0 . Notice, that if we let / to 
be the majority function, we can tolerate 6 = 0 {'/N) since any c^/N bits (for 
small enough constant c) do not influence the resulting (almost random) value 
of majority with probability 1 — o(l). Remarkably enough, Lichtinstein, Linial 
and Saks IbTS^ actually showed that this is the best bit extraction possible. 

Theorem 1 ( [LLS89| ). Majority is the best bit extraction function for the bit- 
fixing source: q{0,CiV(N, N) = o(l), while q{0, C 2 '/N, N)= \ - 0(1) (ci < C 2 ). 

As a side note, a random function / : { 0 , 1 }'^ — >■ { 0 , 1 } is a terrible bit 
extraction function for the bit-fixing source even for 6 = w(l). Indeed, with high 
probability the first (N — b) bits do not fix /, so A can use the last 6 interventions 
to fix /. Another terrible function (even for 6 = 1 ) is any parity function: it can 
be fixed by fixing the last emitted bit. 

Extraction from the SV-source. Recall, the SV-source jSV86j corresponds to 
having 6 = 0 , and where Pr(xi = 1 | . . . Xi-i) S — 6, ^ -I- 6]. On a negative 

side, Santha and Vazirani showed that one cannot extract a bit whose bias is 
less than 6. In other words, many samples (i.e., large N) from the source do not 
help: outputting xi is as good as we can get! Notationally, 

Theorem 2 ( [S V 86] 1 . q{5,0,N) = 6. Thus, one can extract an almost perfect 
bit iff 6 = 0(1), and a slightly random bit iff S = | — >0(1), 

Clearly, there are many (optimal) functions that extract a 6-biased coin from 
any SV-source: for example, any parity function will do. In fact, Boppanna and 
Narayanan |BJN 96 | (extending the ideas of mm) show that a vast majority of 
boolean functions from N bits extract a slightly random bit (provided, of course. 
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S = ^ — 17(1)). Unfortunately, majority is not one of these functions (unless 
S <C l/\/iV, which will turn out to be important soon). Indeed, if our source 
always sets the 1-probability of the next bit to be ^ + 6, the resulting bit will be 
1 with probability 1 — o(l). In fact, Alon and Rabin jA K.SOj showed that majority 
is the worst bit extracting function. Namely, q{S,0, N, majority) > q(S,0, N, f), 
for any /. 

Extraction from Our Source. Looking at the extreme cases of our source (i5 = 0 
and 6 = 0), we notice that somewhat reasonable bit extraction (at least of slightly 
random bits) is possible for both of them. However, the extraction functions 
are diametrically opposite. For the bit-fixing source the best function was the 
majority, and a random function (or any parity function) was terrible, while for 
the SV-source a random function was good (and any parity function is optimal), 
and the majority was the worst. Hence, the best bit extractor becomes the worst 
and vice versa! One may wonder if some extraction function can work reasonably 
well for both of these extreme cases, and hopefully provide a good extraction 
for our combined source as well. Unfortunately, we show that such a magic 
function does not exist for (any “interesting” setting of) our combined source. 
The following theorem follows from Theorem 0 in Section 0 

Theorem 3. If bS = then it is impossible to extract a slightly random bit 

from a {S,b, N)-BCL source, irrespective of the value of N! More precisely, 

q{6, b,N)>-~ = 2 “ (1) 

In particular, while for (5 = 0 we could tolerate b = 0{VN), and for 6 = 0 could 
deal with 6 < ^ — 17(1), now we cannot tolerate 6 — )> oo for any (constant) (5 > 0, 
no matter how large N is. Notice also that the worst-case bias of any extracted 
coin exponentially approaches to ^ as 6 grows. 

Tightness. First, given b and S, let us see under which conditions on N the 
majority on N bits will be a good bit extraction for the {6, b, A^)-BCL source? A 
moment look at the binomial distribution reveals that if b^, b interventions 
allow the adversary to almost control the coin. On the other hand, if A 1/S'^, 
then the i5-bias at every step again allows the adversary to almost control the 
coin. Hence, if 5^ ^ he. b6 ^ 1, then no A will make the majority “good”. 
This is not surprising in light of Theorem 0 but the converse statement is more 
interesting. It is easy to show that if 6^ <C 1/^^, i.e. bS <C 1, any A such that 
b^ A <C l/<5^ will result in the majority being a good extractor (in fact, 
N Ki b/S is the best). But what if A > 1/5^? Of course, the majority itself does 
not work then. However, we can still trivially extract an almost random bit by 
simply ignoring some (say, the first or the last) A — 0(1/5^) bits and taking the 
majority of the remaining 0(l/i5^) bits! Collecting these ideas, we get 

Lemma 1. If b6 = 0(1), b = 0{\/N) and 5 = o(l), one can extract an almost 
random bit from a {S, b, N)-BCL source: q{5, b, A) = o(l) . In particular, this can 
be done by taking majority of any min( A, 0(1/5^)) bits of the source. 
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Complete Picture. We also notice that Theorem 0 does not imply Theorems ^ 
and 12 which study the extreme cases of our source. However, by combining 
all three results with Lemma ^ we get a complete characterization of the of 
bit extraction picture from any ((5, 6, iV)-BCL source. Namely, the following list 
covers all the significant cases: 

1. If 6 = Q{'s/N) or 5 = i — o(l) or h5 = w(l), it is impossible to extract even 
a slightly random bit. These results follow from Theorem E (even for 5 = 0), 
Theorem El (even for 6 = 0) and Theorem El respectively. 

2. If 17(1) < 5 < 5 — 17(1) and 6 = 0(1), then one can extract a slightly but 
not almost random bit (the lower bound follows from Theorem EJ. 

3. If 6 = 0(\/iV) and 65 = 0(1) and 5 = o(l), then one can extract an almost 
random bit from our source. This is exactly Lemma Q 

To have yet another insight on these results, we can let a = max(5, 0(l/\/]V)) to 
be the “effective noise” of our source. In other words, if 5 <C increasing 5 

to 1/\/N does not change the behavior of the source much. Then we can restate 
our main result as follows: when ba = w(l), no good extraction is possible, and 
if ba = 0(1), good extraction becomes possible. 

Expected Number of Interventions to Fix the Outcome. We also study another bit 
extraction measure of our source: the expected number of interventions to always 
fix the extracted coin (to 0 or 1). Due to space limitations, we leave it to the final 
version, where we show that 0(1/5) expected interventions suffice irrespective 
of N. Combining with earlier results of jid jS8hj for 5 = 0 (that 0{'/N) interven- 
tions suffice), we conclude that the right answer is 0(min(l/5, \/]V)) = 0(l/cr). 

Sampling General Distributions. We can look at the question of sampling general 
distributions, and not just a single coin-flip. Not surprisingly, since we could not 
even sample a slightly random bit from our source, the same will hold for other 
distributions. Namely, if 65 = tu(l) and / ideally extracts a non-trivial Y = f{X) 
(i.e., there is no y s.t. Pr(y = y) = 1 — o(l)) from our source, then A can influence 
X to X' such that Y' = f{X') is statistically far from Y: ||Y — Y'\\ > 5 — o(l). 
Thus, no extraction is possible. We leave the details to the final version. 

3 Discrete Control Processes 

Alternative View of Our Source. So far we considered the task of our adversary A 
to be preventing good bit extraction. However, an equally (if not more) natural 
task for A would be to try to force some particular event £, i.e. to force the 
string X = x\. . .xn to satisfy some particular property. To formalize this, let 
£ be an event (or property) on {0,1}^. Equivalently, £ can be viewed as a 
boolean function e : {0, 1}'^ — >■ {0,1}, or as a language L = e“^(l) C {0,1}^, 
via “£ happened 4=^ e{x) = 1 x G L”. We define the natural probability 

p oi £ to be the probability that £ happened for the ideal source (in our special 
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case, emitting N perfect unbiased bits), i.e. p = \L\/2^ , and say that £ is p- 
sparse. Now we want to see if our adversary A has enough power to significantly 
influence the occurrence of £ (i.e., to make x € L). Two dual questions naturally 
come up for a given 5, N and £ (with natural probability p ) : 

1. For a given number of interventions b, what is the smallest probability of 
“failure” that A can achieve? In particular, under what conditions can it 
be arbitrarily close to 0? Can the answer (s) depend on p but not on other 
specifics of £1 

2. Assume we want to guarantee success {£ always happens), by allowing possi- 
bly unbounded number of interventions. What is the smallest expected num- 
ber of interventions needed? Can the bound depend on p but not on other 
specifics of £1 

We define two natural measures that allow us to study the quantities addressed 
in the above two questions. Since <5 is never going to change in our discussion, 
we omit it from all the notation below. 

Definition 2. Define 

— F(p,N,b) = maxf min^ Pr(e(a:) = 0), taken over all p-sparse £, and all 
{S,b,N)-BCL A. 

— B{p,N) = maxf min^ E[6], taken over all p-sparse £ and all N -bit sources 
A (with noise S) necessarily producing x satisfying £. Here E[6] stands for 
the expected number of interventions used by A (over the usage of rule (B)). 

Thus, F{p, N, b) is the largest probability of A’s failure over all p-sparse events, 
and B{p, N) is the smallest expected number of interventions A needs to always 
force any p-sparse £. Notice, both times we take the worst-case p-sparse £. 

Bounding the Probability of Failure. We start with a tight bound on F{p, N,b)- 

Theorem 4. 



F{p, N, b) < 



1 

p-{l + 25f 



2iog(i/p)-e((56) 



(2) 



Thus, if 5b = w(log(i)), A can force any p-sparse £ with probability 1 — o(l). 

Several remarks are in place before the proof. First, N does not enter the 
equation above. Second, Theorem 0 immediately implies Theorem 0 (since for 
any extraction function /, either the event f{x) — 0 or the event f{x) = 1 has 
natural probability p > 1/2). Finally, the bound in Equation (0 is almost tight, 
at least in several significant cases. For example, for p = 1/2, Lemma [D implies 
that A cannot almost certainly force 1 on the majority of min(?V, l/<5^) bits 
when 5b = 0(1). On the other hand, if e is the function that is 1 on the first p2^ 
values of x (in the lexicographic order), A has to intervene at least l7(log(l/p)) 
times in order to force e(x) = 1 with probability more than ^ + <5. 
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Proof. The statement is true for <5 = 0 or 6 = 0, since < 1 < 1/p, 

so assume 5 > 0 and & > 1. Define g{p,b) = We need to show that 

F{p, N, b) < g{p, b) for any A^>1, !<&<fV and 0 < p < 1. We prove this by 
induction on N. For fV = 1, F(0, 1, 6) = 1 < oo = (/(O, b), and F{p, l,b) = 0 < 
g{p, b) for p > 0 (here we used 6 > 1). Now assume the claim is true for {N — 1). 

Take any p-sparse £ given by a function e. Let Cq : {0, 1}^“^ — >• {0, 1} be 
the restriction of e when x\ = 0. Similarly for ei. This defines a po~sparse event 
£q and a pi-sparse event £\ satisfying \{po+Pi) = P- Without loss of generality 
assume po > p > pi- Given such £, our particular adversary A will consider two 
options and pick the best (using its unbounded computational resources): either 
use an intervention (which is legal since we assumed 6 > 1) and fix x\ = 0, 
reducing the question to that of analyzing the po“SPEH'se event £q on (iV — 1) 
variables and also reducing b by 1, or use rule (B) making the 0-probability of 
xi equal to ^ -I- and leaving the same b. We get the following recurrence: 



F{p, N, b) < min[ F(po, iV - 1, & - 1), 

.i^(pi,7V-l,6)+ Q + F(po,iV-l,6) ] 

Let Po = p{l + P) and pi = p{l — f3), where 0 < /? < 1 (since po> P> Pi)- Using 
our inductive assumption, 



F{p, N, b) < min( g{p{l -h /?), 6 - 1), 

Q - -gipi^- P),b) + Q + ■ g{p{^ + fi),b) ) < g{p,b) 



Recalling the definition of g, it thus suffices to show that 



1 



5 -^ 



(1 + /3)(1 + 2,5)^-! ’ p(l - /3)(1 + 2S)f> 

1 + 26 
1 f3 



k+s \ 

p{l + (3)il + 2S)f>J 
’ i-p i+pj 



1 

- p{l + 2S)>> 
< 1 



We show that the inequality above holds for any /3 G [0,1] (since the choice of j3 
is outside of our control). We see that the expressions under the minimum are 
equal when (3 = 26. The following two cases on (3 complete the proof. 

— Case 1. Assume (3 > 26. Then the minimum above is and we need to 
show that < 1, which is equivalent to j3 > 26. 

— Case 2. Assume (3 < 26. Then the minimum above is and we 

need to show that < 1, which is equivalent to (3 < 26. 



Bounding Expected Number of Interventions. We also show a tight bound on 
B{p,N). Namely, B{p,N) = 0(j log(^)) (in particular, this bound is indepen- 
dent on N). Due to space limitations, we leave the proof to the final version. 
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Generalizing to Any Stochastic Process. As we mentioned earlier, our results 
can be generalized to any stochastic process V. Namely, the notion of natural 
probability p of £ and the quantities Fp (p, N, b) and B-p (p, N) can now be 
defined w.r.t. to V completely analogously to Definition 0 While the proofs 
become significantly more involved, in the final version we show for any V: 
Fp{p,N,b) < (1-<5)Vp = Bp{p,N) < logi_,p = 0(i • log(i)). 

4 Black-Box Transformations and Adaptive Coin-Flipping 

The Setting. Collective Coin-Flipping, introduced by Ben-Or and Linial 
is a problem where n (computationally unbounded) processors are trying to 
generate a random bit in a setting where only a single broadcast channel is 
available for communication. At most b out of n players can be controlled by 
a central adversary A (which is called 6-bounded) who is trying to bias the 
resulting coin. Given a protocol 7T, we let Ajj{b) be the largest bias achieved by 
a 6-bounded adversary against II. II is said to be b(n) -resilient if II produces a 
slightly random coin: Ajj{b{n)) < | — C(l), where the constant is independent 
of n. Similarly, II is said to be strongly b(n) -resilient if II produces an almost 
random coin: Ajj(b(n)) = o(l). As we said in the introduction, it makes a crucial 
difference whether A is static (decides whom to corrupt before the protocol 
starts), or adaptive (decides whom to corrupt during the protocol). 

Coin-Flipping with Static Adversaries. The optimal resilient threshold for static 
adversaries in n/2: any n/2 players can always fix the coin [SSfllBNOPj . while 
there exist (| — e)-resilient protocols (even constructive and efficient ones) for 
any e > 0 |BN()0IH./98IF99j . We also point out a very simple dependence of the 
optimal bias A{b) (defined to be the smallest bias achieved by a coin-flipping 
protocol: min/j Ajj{b)) on the number of players: A{b) = 0{b/n) 

Finally, we point out that all the best statically secure coin-flipping protocols 
are not even 1-resilient against adaptive adversaries. This is due to a historical 
feature that all such protocols first elect a single (hopefully, not faulty) represen- 
tative player (called a leader), who then flips the final coin by itself. Corrupting 
such a leader at the end clearly controls the coin. 

Coin-Flipping with Adaptive Adversaries. Adaptive adversaries were already 
considered by Ben-Or and Linial [Bb90] . who observed that the “majority” pro- 
tocol (each player sends a random bit, and the final coin is their majority) 
achieves adaptive 0(-v/n)-resilience. Surprisingly enough, this simple protocol 
is the best known adaptively secure coin-flipping protocol! In fact, Ben-Or and 
Linial jBLfiOj conjectured that this protocol to be optimal! This conjecture (call 
it (*)), if true, would imply that adaptive adversaries are much more powerful 
than static adversaries (where the threshold is n/2) for the problem of collective 
coin-flipping. Interestingly enough, the only result that in support of conjecture 
(*) comes from the bit-fixing source of mm- Namely, when each player sends 
only 1 bit in the entire protocol, the optimal behavior of the adversary is exactly 
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the same as in the bit-fixing source with b interventions! Since the majority was 
the best bit extraction function for the bit-fixing source, conjecture (*) is true in 
this case. This result is interesting since is already illustrates the power of adap- 
tivity. Namely, in the static case one can achieve I7(n/ log^ n)-resilience |AL93| 
when players send only 1 bit, even in one round. However, it supports the con- 
jecture (*) much less than it seems to. Indeed, restricting each player to send at 
most 1 bit seems like a huge limitation. For example, it is very limiting even for 
statically secure protocols (i.e., no function can be more than 0{n/ logn)-resilient 
by the result of lEZI SOj, and there are general n/2-resilient statistically secure 

To summarize, adaptively secure coin-flipping is much less understood than 
its static counter-part, there seems to be some indication that adaptive adver- 
saries are much more powerful than static adversaries, but there is little formal 
evidence supporting this claim. 



Black-Box Reductions. Due to space limitations, we leave the formal treatment 
to the final version, and instead provide informal (but informative) intuition of 
our approach, which we already sketched in the introduction. Namely, we want to 
sequentially run a static coin-flipping protocol U for N times, and try to extract 
a good coin from the N outcomes Xi...xn. If 6 = Z\yj(6), then we reduced 
the adaptive adversary ^ to a (5, 6, iV)-BCL source: rule (A) corresponds to 
corrupting a player during one of the N sub-protocols, while rule (B) corresponds 
to not doing so and using the power of the static adversary. Notice, while b and 
6 are fixed (given n), we have the power to make N really huge, which seems 
to give us a considerable advantage. Unfortunately, the strong negative result of 
Theorem 0 shows that this advantage is, actually, an illusion. Namely, our results 
say that the possibility of bit extraction from our source depends on whether 
bS = 0(1) or b5 = w(l), i.e. a large number of repetitions N does not help. 

Nevertheless, when is b5 = 0(1)? Notice that the best i5 we could hope for 
(without looking at the specifics of 7T), while definitely no more than Z\(6), can 
not be much less than A{b) = 0{b/n) as well. For example, at the very beginning 
A could corrupt 5/2 players that gives 5 > Z\(5/2) = 0(Z\(5)), and still have 
5/2 arbitrary corruptions left. Hence, our “black-box” approach can work (and 
actually can he made to work) only if 5 • 0(b/n) = 0(1), i.e. 5 = 0(-^/n). Since 
such 5 can be trivially achieved by the majority protocol, we cannot achieve 
adaptive security (beyond what is known) “for free”. 

Discussion. We are not saying that black-box transformations are the most 
natural way to achieve adaptive security. However, the “breaking point” of our 
approach is exactly (believed to be optimal) 5 = 0{y/n). The latter “coincidence” 
does give some further evidence to conjecture (*). 
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Abstract. It is known that random fc-SAT instances with at least dn 
clauses where d — dt is a. suitable constant are unsatisfiable (with high 
probability). This paper deals with the question to certify the unsatisfi- 
ability of a random 3-SAT instance in polynomial time. A backtracking 
based algorithm of Beame et al. works for random 3-SAT instances with 
at least n^/ logn clauses. This is the best result known by now. 

We improve the n^/logn bound attained by Beame et al. to for 

any e > 0. Our approach extends the spectral approach introduced to 
the study of random fe-SAT instances for fc > 4 in previous work of the 
second author. 



Introduction 

We study the complexity of certifying unsatisfiability of random 3-SAT instances 
(or 3-CNF formulas) over n propositional variables. The probability space of 
random 3-SAT instances has been widely studied in recent years for several 
good reasons. The most recent literature is IAc'i0()0l . [P7?irn . I hie et alh7l . 

One of the reasons for studying random 3-SAT instances is that they have 
the following sharp threshold behaviour !br99): There exist values c = c(n) such 
that for any £ > 0 formulas with at most (1 — e) • c • n clauses are satisfiable 
whereas formulas with at least {l + e)-c-n are unsatisfiable with high probability 
(that means with probability tending to 1 when n goes to infinity). Note, that 
the aforementioned result does not say that c = c(n) is a constant, however the 
general conjecture is that c(n) converges to a constant. Much recent work tries 
to approximate c(n) and the currently best results are that c(n) is at least 3.125 
pAc2()()0| and at most 4.601 [KiKrKrnR) . Inaccessible to the authors at the time 
of writing is a FoCS 2000 paper making further progress on the lower bound for 
c(n) . (For random 2-SAT instances the analogous threshold is at c = 2 k!hke02l . 
| |Go96| .) 

The algorithmic interest in this threshold is due to the empirical obeserva- 
tion that random 3-SAT instances at the threshold, i.e. with around c-n random 



F. Orejas, P.G. Spirakis, and J. van Leeuwen (Eds.): ICALP 2001, LNCS 2076, pp. 310-^^3 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



Recognizing More Unsatisfiable Random 3-SAT Instances Efficiently 311 



clauses are hard instances. The following behaviour has been reported consis- 
tently in experimental studies with backtracking algorithms for satisfiability, see 
for example | ^elVliLe!?n| [CrAu9b| : The average running time is quite low for in- 
stances below the threshold. For instances with at most 4n clauses the formulas 
are satisfiable and it is quite easy to find a satisfying assignment. A precipitous 
increase in the average complexity is observed at the threshold. For 4.2n clauses 
about half of the generated formulas are satisfiable and it is difficult to decide 
if a formula is satisfiable or not. Finally a speedy decline to lower complexity is 
observed beyond the threshold. All instances with 4.5n clauses are unsatisfiable 
and the running time decreases again (in spite of the fact that now the whole 
backtracking tree must be searched.) Note however that the decline in running 
time cannot yield polynomial average time. This follows from the paper |ChSz88j 
on which we comment further below. 

Except of trivial observations there are no general complexity theoretical re- 
sults relating the threshold to hardness. The relationship of hardness and thresh- 
olds has also been observed for fc-colourability of random graphs with a linear 
number of edges [Pe We8H| . |AcFr9H| and for the subset sum problem, see |lmJN a9fi| 
for a theoretical discussion. 



Abandoning the general point of view and looking at concrete algorithms 
the following results are known for random 3-SAT instances: All progress ap- 
proximating the threshold from below is based on the analysis of rather simple 
polynomial time heuristics. In fact the most advanced heuristic being analyzed 
[IAc2( )()()[ only finds a satisfying assignment with probability of at least e where 
e > 0 is a small constant for 3-SAT formulas with at most 3.145n clauses. The 
heuristic in [FrSiiflBj finds a satisfying assignment for 3-SAT almost always for 
3-SAT instances with at most 3.003n clauses. On the other hand the progress 
made in approximating the threshold from above does not provide us at all 
with efficient algorithms. Here only the expectation of the number of satisfying 
assignments is calculated and is shown to tend to 0. 



In fact, beyond the threshold we have negative results: For arbitrary but fixed 
d beyond the threshold random 3-SAT instances with dn clauses (are unsatisfi- 
able and) have only resolution proofs with an exponential number, that is with at 
least clauses mm- This has been improved upon by IM3, EiEMI, 
and IHe et alHTl all proving (exponential) lower bounds for larger clause/ variable 
ratios. Note that the size of resolution proofs is a lower bound on the number of 
nodes in any classical backtracking tree as generated by any variant of the well 
known Davis-Putnam procedure. 



Next we come to the historical development of polynomial time results be- 
yond the threshold. In it is shown that 3-SAT formulas with at least 

clauses allow for polynomial size resolution proofs. This is strengthened in 
[IRe et a,l97| to the best result known by now: For random 3-SAT instances with 
at least n^/log n clauses a backtracking based algorithm proves unsatisfiability 
in polynomial time with high probability. (The result of Beame et al. is slightly 
stronger as is applies to formulas with i7(n^/logn) clauses.) For general fc-SAT 
the algorithm of Beame et al. works for formulas with at least n^“^/(logn)^“^ 
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clauses. This is improved in |(loKr‘20n(l] where a spectral approach is shown to 
work for at least poly(log n) ■ random clauses for even k. For odd k we get 
the exponent {k + l)/2 instead of k/2. This implies that for fc = 3 the result of 
Beame et al. is still the best known. 

We extend the approach of our previous paper to show that for any e > 0 
random 3-SAT instances with at least ^ clauses can be efficiently certified 

as unsatisfiable thus improving the previous best bound of Beame et al. As in 
[KfoKr2fl(Tn] we associate a graph with a given formula. Then we show how to 
certify unsatisfiability of the formula with the help of the eigenvalue spectrum 
of the adjacency matrix of this graph. Note that the eigenvalue spectrum can 
be approximated with sufficient precision in polynomial time by standard linear 
algebra methods. 

One technical contribution of the present paper when compared to 
[KloKr20r)n] is a lemma bounding the size of the largest independent set of ver- 
tices of a graph with the help of the eigenvalue gap of the adjacency matrix of 
this graph. We speculate that this may be of independent interest. In |OoKr20nn) 
we use a matrix derived from the adjacency matrix instead of the adjacency ma- 
trix itself to bound the size of the largest independent set. In bounds on 

the size of the largest independent set are given in terms of the spectral gap 
of the Laplacian matrix of the graph. These results cannot be directly applied 
to the present situation because in our case it is not clear how to estimate the 
eigenvalues of the Laplacian matrix, instead of those of the adjacency matrix. 

Note that eigenvalues can be used to help to find a solution to a random in- 
stance of an NP-complete problem lAlkahdl or to prove the absence of a solution 
as in our case, for example. 



1 Prom Random 3-SAT Instances to Random Graphs 



We consider a family probability spaces of random 3-SAT instances, Form„_p = 
Form„_p_3, which is defined as follows: The set of 3-clauses is the set of all 3- 
tuples h V ^2 V I 3 where li is a literal over a standard set of n propositional 
variables. A literal either is a propositional variable x or its negation ->x. As 
double occurrences of literals inside of clauses are allowed we have altogether 
(2n)^ = 8n^ 3-clauses. A random instance F from Form„^p is obtained by adding 
each single clause independently with probability p to F. We think of the clauses 
of F’ as being joined conjunctively and write F = {C\, . . . , Cm} = C\ /\ ... A 
Cm- In the sequel we assume that p = p{n) = where 1/2 > 7 > 0 

is a constant. Note that our space of formulas is analogous to the space of 
random graphs Cn,p- The number of clauses in a random instance from Form„_p 
follows the binomial distribution with parameters 8n^ and p, Bin(8n^, p) and 
the expected number of clauses is 8n^ • p = . 

Another popular family of probability spaces of random 3-SAT instances is 
the space Form„_m- Here each formula is a set of exactly m clauses and each 

^8r^3^ 



formula has probability 1 / 



m 



Form„_m is analogous to the space of random 
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graphs Gn,m- We feel confident and in line with common usage that our results 
also apply to Form„_m if m = 8n^ • p is the expected number of clauses in our 
Form„ p model. See |ljo85| for a general theorem relating analogous random 
graph models. Clauses might be defined in a slightly different way, too: They 
might be sets of literals, they might be tuples without repetitions, they might 
be non-tautological, that is not containing x and -<x together. Again we assume 
without actually checking that our results apply to the other probability spaces, 
too. 

We state a graph theoretical condition which implies the unsatisfiability of 
a 3-SAT instance F over n propositional variables. To this end we define the 
graphs Gp and Hp. The graph Gp = {Vp, Ep) is defined as follows: 

— Vp is the set of ordered pairs over the n propositional variables. We have 
\Vp\ = n?. 

— The edge (ai,5i) (025^2) (where in order to avoid loops (ai,5i) ^ (02,62) 

that is oi 02 or 61 62) is in Ep if there exists a variable z such that F 

contains both clauses oi V 02 V z and 61 V 62 V -iz (or both clauses 02 V oi V z 
and 62 V 61 V ~<z, note however that our graph is undirected and it is not 
strictly necessary to mention explicitly this possibiliy.) Note that the Oi and 
bi are variables, that is positive literals. 

The graph Hp is defined totally analogously but with different clauses: Its 

vertices are as before ordered pairs of variables and (oi, 61) (02, 62) is an edge 

iff F has the clauses ->01 V -102 V z and -'61 V -’62 V ->z for a variable z. 

Some comments concerning the intuition behing this definition: In 
| |CoKr2()nn| we give an efficient algorithm which demonstrates the unsatisfia- 
bility of random 4-SAT instances with at least poly(log n) ■ clauses. Here we 
build on the techniques introduced in this paper. The clause oi V 02 V 61 V 62 is 
obtained by resolution |!Schj with z from the two clauses ai V 02 V z and 61 V 62 V -<z 
which define an edge of Gp. Similarly we have that -<ai V -102 V -'61 V -162 is 
obtained from -lai V -102 V z and -i6i V -162 V -iz. The correctness of resolution 
states that F is unsatisfiable if a set of resolvents of F is unsatisfiable. 

For any given z the number of clauses like oi V «2 V z and 61 V 62 V ->z is 
concentrated at the expectation m ri^ ■ p = Applying resolution 

with z to all these clauses gives ~ > n clauses Oi V 02 V 61 V 62. Doing 

this for all n variables z gives > all-positive clauses of size 4. In the same way 
we get > all-negative 4-clauses. With the help of the technique introduced in 
| |(IoKr2()[7n| we get an efficient algorithm which demonstrates unsatisfiability of 
these newly obtained 4-clauses and the correctness of resolution implies that F 
itself is unsatisfiable. Note that our graphs Gp and Hp reflect the sets of clauses 
obtained by resolution as above in that each clause induces an edge in one of 
these graphs. 

Some detailed remarks concerning Gp: Only for technical reasons the vari- 
able z which is resolved upon is the last variable in our clauses. (Recall we 
consider clauses as ordered triples.) More important is the fact that the edge 
reflects the resolvent ai V 02 V 61 V 62 not in the most natural way by the edge 
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(01,02) (61,62) but by (oi,6i) (02,62). The variables of the vertices con- 

nected by the edge come from the different clauses taking part in the resolution. 
The reason why this is important is to increase the independence of the edges 
oi Gf when T’ is a random formula. Again more of a technical nature is the 
convention that the variables in the first position of each vertex come from the 
clause which contains the positive literal z, whereas the second variables 61,62 
come from the clause with ->z. 

Recall that a set S of vertices of a graph G is an independent set iff there is no 
edge inside of S, and a{G) is the independence number oi G that is the maximum 
number of vertices of an independent set of G. The independence number is JVV- 
hard to determine. Therefore we cannot use the following theorem directly to 
get an efficient algorithm certifying unsatisfiability by simply computing the 
independence number of Gp and Hp- The proof of the next theorem relies on 
the correctness proof of resolution jSchj and is not difficult. 

Theorem 1. If F is a 3-SAT instance over n variables which is satisfiable then 
we have: 

ot{Gp) > n^/4 or a{Hp) > n^/4. 

Within Gp (and Hp) the presence or absence of an edge is not independent 
of that of another edge, and so techniques from the area of standard random 
graphs cannot be applied without further consideration. From now on we restrict 
attention to Gf, of course everything applies also to Hp. We collect some basics 
about Gp. 

An edge (oi,6i) (02,62) in Gp is only possible if oi 02 or 61 yf 62. 

We take a look at the structure of the clause sets which induce the fixed edge 

(oi,6i) (02,62). The edge (oi,6i) (02,62) is in Gp iff F contains for a 

variable 2: at least one of the pairs of clauses oi V 02 V z and 61 V 62 V -iz (or one 
of the pairs 02 V oi V z and 62 V 61 V -iz ) . 

Case 1: Oi y^ 02 and 61 62. In this case all the “z-clauses” necessary to 

induce the edge are distinct and all -iz-clauses, too. As the z and -iz clauses 
are all distinct from each other, too, we have 2n disjoint pairs of clauses which 
induce the edge (oi,6i) (02,62). 

Case 2: oi = 02 and 61 yf 62. In this case the clauses oi V 02 V z necessary 
for the edge are all distinct. However oi V 02 V z = 02 V Oi V z. The -iz-clauses 
are all distinct and also the z- and -iz-clauses. We have altogether 2n pairs of 
clauses where always two pairs have the common clause oi V 02 V z. The last 
case oi y^ 02 and 61 = 62 is analogous to the second case. 

With these observations we can get a first impression of the probability of a 
fixed edge in Gp\ If Oi 02 and 61 yf 62 the number of pairs of clauses which 

induce the edge (oi, 61) (02, bf) is distributed as Bin(2n, p^). The probability 

that the edge is induced by two pairs of clauses is at most (^^) • p'^ = o(2np^). 

This makes it intuitively clear that the probability of (ai,6i) (02,62) being 

in Gf is about 2n ■ p^. 

If a\ = 02 and 61 y^ 62 we observe that the number of clauses like 61 V 
62 V -iz or 62 V 61 V -iz is distributed as Bin(2n, p). The probability to have at 
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least two of these clauses is o{2np). Conditioning on the occurrence of at least 
one of these clauses it becomes inuitively clear that the probability of the edge 

(ai,5i) (02,62) should also be about 2 n- p^. The following two results make 

this precise. 

Lemma 1. We fix the edge e = (oi,6i) (02,62) and recall p = 

(a) For oi yf 02 and b\ 62 we have that 

Pr[Gp\ e is an edge of Gp] = 2n- p^ ■ {1 + 0{ 2+27 ))' 

(b) For Oi = 02 and 61 yf 62 this probability is 

2n-/.(i + 0{J^)). 

The same applies of course to oi y^ 02 and 61 = 62. 

The preceding lemma implies the following expectations: 

Corollary 1. (a) E[Number of edges of Gp] = • (1 + 0(— )). 

(b) E[Degree of the vertex (oi,6i)] = • (l + 0( — )). 

n 

Observe that n^-2n^“^'’' = reflecting the fact that the sum of the degrees 

of all vertices is two times the number of edges. The number of vertices altogether 
is equal to and the probability of a given edge is ~ 2/n^+^'>'. Disregarding 
edge dependencies Gp is a, random graph Gn^y where p' = 2/n^+^''". As 7 < 1/2 
this situation is equivalent to that of a random graph over n vertices with edge 
probability n'^/n with 6 > 0. 

2 Concentration of the Degree 

The degree of a given vertex of a random graph with n vertices and edge prob- 
ability /n follows the binomial distribution Bin(n— l,n‘^/n). Exponential tail 
bounds for the binomial distribution imply that each vertex has its degree sharply 
concentrated at its expectation Ri . (Note that exp(— n'*) = o(l/n) and we 
can proceed from a fixed vertex to all vertices.) To show an analogous result for 
Gp we consider a fixed vertex (ai,6i) and determine the number of edges like 

(ai,6i) (02,62). Before looking at the degree of (oi,6i) itself we look at the 

number of unordered pairs of clauses 

oi V 02 V 2; and 61 V 62 V -12; (1) 

in a random F where 02, z, 62 are arbitrary. To show concentration of the number 
of pairs of clauses as in m we follow ||AiSp92| Chapter 8, Section 4. The technical 
problem to deal with is that distinct pairs are not always disjoint and thus are 
not independent. Nevertheless the intuition is that they are nearly independent 
and we have: 
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Theorem 2. Let e > 0 be fixed, let X be the random variable giving number of 
pairs of clauses as in m and let pt be the expectation of X. Then Pr[\X — pi\ > 
epi] = o(l/n^). 

Some pairs of clauses as in (P) might induce the same edge. But that does 
not destroy concentration: 

Corollary 2. Let ai and b\ be two variables and e > 0. The degree of the 
vertex in Gp is with probability 1 — o(l/n^) between — e) and 

+ Moreover, with high probability the degree o/each vertex is within 
the specified interval. 



3 Spectral Considerations 



In this section we prove a general relationship between the size of an indepen- 
dent set in a graph and the eigenvalues of its adjacency matrix. Then we prove 
that the random graphs Gp and Hp satisfy certain eigenvalue bounds with high 
probability. These eigenvalue bounds certify that the graphs G p and Hp do not 
have independent sets as required by Theorem P in order for F to be satisfi- 
able. Background from spectral graph theory can be found for regular graphs in 
||AlSp92| and for the general case in |Ch97j . The linear algebra required is well 
presented in [St8bj . 

Let G = (y, E) be a standard undirected graph and Aq the adjacency matrix 

of G. Let Ac's eigenvalues be ordered Ai > • • • > A„, with n= \ V\. We say that 

G is v-separated if |Ai| < v\\ for i > 1. With A = max|Ai| this reads A < pXi. 

i>l 

We say that G is e-balanced for some e > 0 if there is a real d such that the 
degree of each vertex is between d(l — e) and d{l -I- e). 

Theorem 3. LfG is v-separated and e-balanced, then G contains no independent 
set of size > (n/5) -I- n- f{v,e) where f{v,e) tends to 0 as v,e tend to 0. 

We remark that this theorem can probably be greatly improved upon. But this 
weak theorem does preclude independent sets of size n/4 for small v, e, and that 
is all we need here. 

Proof. Let S be an independent subset of vertices of G. We will bound [S'!. Let 
T = V \ S. Let xSiXt be the characteristic functions (represented as column 
vectors) of S,T respectively (i.e. taking the value 1 inside the set and 0 outside 
the set). As S is an independent set and G is e-balanced, we have 



d{l-e)\S\< 



edges leaving S 



= < -^gXs,Xt > ■ 



( 2 ) 



Note that AqXS is the column vector whose j’th entry is the number of edges 
going from vertex i into the set S. Recall that T = V \ S and < > is 

the standard inner product of two vectors. We show further below that 



< X^GXS, XT > < d{l d- e) • (1/2 -h v) ■ ^\S\\T\. 



( 3 ) 
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Abbreviating 6 = \S\/n we get from (|21) and Q that 

0/(1 — 0) < (1/2 + g(z/, e))^ = 1/4 + g{v,s) + where 

g(i/, e) = (e + (1 + e):z)/(l — e) as can be seen by elementary algebra. Note that 

g{i^,s) goes to 0 when and e approach 0. We set /(j^, e) = (4/5)((/ + g^) and 

get: 9 < (l/4)(4/5) + (4/5)(g + = 1/5 + /, which is the theorem. 

We need to show inequality ©• Let ui, ... ,Un be an orthonormal basis of 
the n-dimensional vectorspace over the reals where Ui is an eigenvector of Ac 
with eigenvalue A^. We can decompose the adjacency matrix as 
Ag = Xi'Ui-uJ + \ 2 -U 2 -U 2 +■■■+ Xn-Un'U^, where uf = (tti^i, . . . , is the 
transpose of the column vector Ui. Note that Xi ■ {ui ■ uf) -v = Xi-v if v = a-Ui 
and Xi ■ {ui ■ uf) ■ v = 0 for u orthogonal to Ui. Let 8 = Aq — X\ ■ u\ ■ uf = 
Xi ■ Ui - uf . and represent xt over the basis of the uf. xs = Yf!i=i 

i>2 

and XT = Sr=i Pi ' Recall here the fact known as Parseval’s equation: 
|5'| = lIxsiP = E a? and \T\ = \\xt\P = E We get easily 
< AgXS,Xt >= {Xi{uf ■ xs)) ■ {uf ■ Xt)+ < £ ■ xs , XT > and proceed to 
bound both summands separately. 

Because of the orthornormality of the Ui we get: 

<£xs,Xt>< a - X-^M-VW\<’^-d{l + e).Mm 

Y *>i Y 

where the last step holds because Ai is bounded above by the maximum degree 
of the vertices, the last but first step uses Parseval’s equation and the last but 
second Cauchy-Schwarz inequality, E \c^iPi\ < 's/E 

Now we come to the other summand, (Xi(uf ■ xs)) ‘ (uf ■ Xt)- Let a, [3 be 
the average values of ui on S,T respectively, that is a = (E 'ai,j)/|5'| where 
the sum goes over j € S. With the inequality of Cauchy-Schwarz we get: 

2 (EK. -i)f . (!:<,-)■ (El) ^ E uj, 

|5|2 - |5|2 1^1 

which implies a^|S'| < ufj. As T = V\S we get 

n 

+ PfT\ < = IKir = L 

1=1 

Using the fact that the geometric mean is bounded by the arithmetic mean this 
implies a^y\W\- P^/\T) <1/2. (The weakness of this theorem undoubtedly comes 
from the pessimistic estimate used «\/|^ • P^/Wl ^ + /3^|R|)/2 which 

is only close to the truth when a^lS”! is close to /3^|T|). This implies as Ai is 
bounded above by the maximum degree that 

Ai • (ii^xs) • (ufxT) = d(l + £)a|5| • (3\T\ < (1/2) ■ d(l + e)^\SW\ 

and we get (0 finishing the proof. □ 

We next show that the graphs Gp, a,nd Hp are zz-separated for a small i/. We 
do this by applying the trace method, see for example [IFr91j . in an elementary 
form. We first give a general outline of this method. For A = Aq an adjacency 
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matrix we have from linear algebra that for each k > 0 Trace(A^) = >, ^ 



i=l 



(The trace of a matrix is the sum of the elements on the diagonal.) TraceA^ 



of length k in the underlying graph 
like Oq Oi 02 



can be calculated from the underlying graph as: Trace(A^) = closed walks 

. A closed walk of length fc in G is a walk 
Ofc_i Ofe = oq- I^ ote that the 

Cl 62 63 6fc_i 6fc 

and Qi need by no means be distinct. As we assume the graph loopless we can 
only conclude that Oi_i o^. For all even k we have that all > 0 and we get 



Trace(A^) = + max Abbreviating A = max Ai we get further 

2>1 i>l 

i—1 

A^ < X^r =2 ^i- underlying matrix A is the adjacency matrix of a random 

graph this applies in particular to the expected values: 



G[A'=] < G[Trace(A'=)] - G[A5;'] = E[J2 Af]. (4) 

i=2 



Now assume that we have an underlying variable n as in the Gn,p model of 
random graphs and that if[A^] = o(A^) holds with high probability. Then for 
each constant j/ > 0 we get Pr[\ > v\i\ = Pr[X^ > (i/Ai)^] = 

= Pr [A'' > ((i 2 Ai)VG[A'=]) • G[A'=]] < Pr [A^= > l/o(l) • E[\^]] + o(l) < o(l) 
where we apply Markov’s inequality and use the fact that v and k are constant. 
This says that the graphs considered are i^-separated with high probability. (The 
idea considering the fc-th power of the eigenvalues seems to be to increase the 
gap between the largest eigenvalue and the remaining eigenvalues.) The proof 
of the following lemma prepares for the more complex situation with the graphs 
Gp and Hp. 

Lemma 2. Let v > 0 and S > 0 be eonstants. With high probability a random 
graph from Gn,p with p = n'^/n is v-separated. 

Proof. Let A be the adjacency matrix of a random graph from Gn,p- Let k be 
an even constant to be specified further below. We bound G[A^] by bounding 
if [Trace(A^)] — G[A5’], see 0. We calculate both expectations separately. From 
concentration of the degree of each vertex we get that with high probability Ai 
is between {1 — e)n^ and (1 + As the probability of failure is exponentially 
low in n and k is constant we get if[A 3 ] > (1 — e)^n^^ — o(l). Next we come to 
the expectation of the trace. For a = (oq, . . . , Uk-i, at = oq) let walk(a) be the 
indicator random variable of the event that the walk given by a is possible in a 
random graph, that is all edges 6^ = (ai-i,ai) = {ai-i,ai} for 1 < i < k occur. 
Then if [Trace(A^)] = E P[walk(a) = 1]. To calculate the preceding sum we 

a 

distinguish three types of possible walks a. A walk is distinct iff all edges Ci are 
distinct. A walk is duplicated iff each edge among the Ci occurs at least twice. 
A walk is quasi- distinct iff some edges among the occur at least twice and 
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some only once. For a distinct we have that the expected number of such walks 
is bounded above by . (Compare to our estimate for if[Ai] above.) 

For a duplicated we parameterize further with respect to the number j with 
1 < J < kj2,oi distinct edges among the e^. the number of possibilities here is at 
most and for the expected number of duplicated walks we get the upper 

fe/2 k/2 

bound ^ . (nVn)^ < '^k'^^-n-n^^ < {k/2) ■ ■ n - 

i=i 

that picking k > 2/5 implies that 1 + 5k /2 < 5k which in turn implies that the 
bound is o{n^^). Note that we must pick k larger when 5 gets smaller in order 
that the last statement holds. (This is reassuring.) 

For the number of quasi-distinct walks we first assume that the last edge, 
Cfc, is a unique edge of the walk. We get similarly to the preceding bound a 
bound of k ■ k'^^ ■ As the last edge need not always be unique we get an 

additional factor of k. The estimate is always o{n^^) as 5{k— 1) < 5k. Summing 
all preceding estimates we get if[Trace(A^)] < + o{n^^) and < 

< (1 — (1 — e)^)- + o{n^^). As £ > 0 can be chosen arbitrarily small, k 

is constant and the preceding estimate holds whenever n is sufficiently large 
this means that E[X^] = o{n^^) = o(A^) with high probability. By the general 
principle above the graphs from Gn,p are iz-separated with high probability. □ 

As graphs from Gn,p are e-balanced for any e > n Theorem 0 implies that 
we can efficiently certify that a random graph from has no independent set 
with much more than n/5 vertices with high probability. The treatment of our 
graphs Gp, Hp based on the method above is more technical but otherwise the 
same. 



Theorem 4. For F G Forninp^s be the adjaceney matrix of Gp and let 



Ai > A 2 > • • • > A „2 be the eigenvalues of Ap. Then E 






is equal to 



E[Traee{A%)] < {2n^~^'^f + c ■ k^ ■ k^^ ■ 2^ ■ 

where c is a eonstant (c = 100 should be enough). Ifk> 4/(1 — 2y) the preeeding 
estimate is (2n^“^'>')^ + o (eompare Corollary^. The same applies 

to Ftp. 



Proof. For any F we have that Trace(Ai^) = 
Gp 



closed walks of length k in 

. A typical closed walk of length k is (oq, bo) (oi, bi) ■ ■ ■ (ofe-i, &fe-i) 

{ok,bk) = (ao,bo). Now consider a step (oi-i, bi-i) {ai^bf) of this walk. 

For this step to be possible in Gi? the formula F must have one of the following 
2n pairs of clauses: V V z, 6i_i V 6^ V -iz for a propositional variable z or the 

other way round, that is Oi, bi first. We say that pairs of the first type induce the 

step (ai_i,6i_i) {oi,bi) with sign +1 whereas the second type induces this 

step with sign —1. For two sequences of clauses C = (Gi, G 2 , . . . , Ck) where the 
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last literal of each Ci is a positive literal and D = {Di, D 2 , ■ ■ ■ , -Dfe), where the 
last literal of each Di is negative, and a a sequence of signs e = (ei, . . . ,£fc) 
we say that C, D, e induce the walk above iff for each i the pair of clauses 
Ci,Di induces the j’th step of the walk with sign given by e^. Note that the 
occurrences of the clauses Di and Cj in a random F are independent as these 
clauses are always distinct. We say that F induces the walk above iff we can 
find sequences of clauses C,D C F (the Ci, Di need not necessarily be all dis- 
tinct) and a sequence of signs e such that C,D,e induce the given walk. We 
observe: First, Gp allows for a given walk iff F induces this walk as defined 
above. Second, three sequences C,D,e induce at most one walk, but one walk 
can be induced by many C,D,e’s. (Without the e it is possible that C,D 
induce several walks.) Thus we get that Trace(A^) can be bounded above by 



C,D,e's, withC, £) C F inducing a closed walk of length k 



and this trans- 



fers to the expectations over random formulas F. The notions distinct, quasi- 
distinct, and duplicated transfer naturally from graphs to D, C . We decompose 
the expected number of C, T), e’s which induce a closed walk of length k accord- 
ing to all combinations oi C, D being distinct, quasi-distinct or duplicated. The 
reader with some experience can now easily fill in the remaining technical detail 
alogn the lines of the proof of Lemma |21 □ 



Now our algorithm is obvious: We pick e, v sufficiently small such that the 
f{v,e) from Theorem|3|is < 1/20 (because 1/5-1-1/20 = 1/4). Given F sForm„^p 
where p = we construct Gp. Corollary 0 and Theorem 0 imply that 

Gp is £-balanced and z^-separated with high probability. We efficiently check 
if maximum degree/minimum degree < (l-|-£)/(l — £). This holds with high 
probability, in case it does not the algorithm fails. Now we determine Ai and A 
with sufficient precision. We have that A < i/Xi with high probability. If the last 
estimate does not hold, we fail. By Theorem 0 the algorithm now has certified 
that Gp has no independent set of size > n^/4. We do the same for Hp. With 
high probability we succeed and by Theorem 0F is certified unsatisfiable. 

Our algorithm works with high probability with respect to the binomial space 
From„ p where p is such that the expected number of clauses is the announced 
77,3/2+"^. In case we want to show that it works for the space Form„ „j with 
m = n3/2-i-E additional consideration is necessary: We would have to show that 
the algorithm fails in Form„^p only with probability of o{l/-\/n). This is sufficient 
because the Local Limit Theorem implies that the set of formulas in Form„^p 
having exactly the expected number of clauses has probability bounded below 
by D{l/y/n). We leave the detailed asymptotics required (we guess that k must 
go slowly to infinity for this) to the full version. 



Acknowledgement. Helpful and detailed remarks of a referee improve presen- 
tation. 



Recognizing More Unsatisfiable Random 3-SAT Instances Efficiently 321 



References 



[Ac2000] 

[AcFr99] 

[AlKa94] 

[AlSp92] 

[Be et al97] 

[BePi96] 

[Bo85] 

[Ch97] 

[ChRe92] 

[ChSz88] 

[CrAu96] 

[Fr91] 

[Fr99] 

[FrSu96] 

[Fu98] 

[Go96] 

[GoKr2000] 

[ImNa96] 

[KiKrKr98] 

[PeWe89] 

[Sch] 

[SeMiLe96] 

[St88] 



Dimitris Achlioptas. Setting 2 variables at a time yields a new lower 
bound for random 3-SAT. In Proceedings SToC 2000. 

Dimitris Achlioptas, Ehud Friedgut. A threshold for random k- 
colourability. Random Structures and Algorithms 1999. 

Noga Alon, Nabil Kahale. A spectral technique for colouring random 
3-colourable graphs (preliminary version). In Proceedings SToC 1994. 
ACM. 346-355. 

Noga Alon, Joel H. Spencer. The probabilistic method. Wiley & Sons 
Inc. 1992. 

Paul Beame, Richard Karp, Toniann Pitassi, Michael Saks. On the 
complexity of unsatisfiability proofs for random k-GNF formulas. 1997. 
Paul Beame, Toniann Pitassi. Simplified and improved resolution lower 
bounds. In Proceedings FoCS 1996. IEEE. 274-282. 

Bela Bollobas. Random Graphs. Academic Press. 1985. 

Fan R. K. Ghung. Spectral Graph Theory. American Mathematical 
Society. 1997. 

Vasek Chvatal, Bruce Reed. Mick gets some (the odds are on his side). 
In Proceedings 33nd FoGS 1992. IEEE. 620-627. 

Vasek Ghvatal, Endre Szemeredi. Many hard examples for resolution. 
Journal of the ACM 35(4), 1988, 759-768. 

J. M. Crawford, L. D. Auton. Experimental results on the crossover 
point in random 3SAT. Artificial Intelligence 81, 1996. 

Joel Friedman. Combinatorica, 1991. 

Ehud Friedgut. Necessary and sufficient conditions for sharp thresholds 
of graph properties and the k-SAT problem. Journal of the American 
Mathematical Society 12, 1999, 1017-1054. 

Alan M. Frieze, Stephen Suen. Analysis of two simple heuristics on a 
random instance of k-SAT. Journal of Algorithms 20(2), 1996, 312-355. 
Xudong Fu. The complexity of the resolution proofs for the random set 
of clauses. Computational Complexity, 1998. 

Andreas Goerdt. A threshold for unsatisfiability. Journal of Computer 
and System Sciences 53, 1996, 469-486. 

Andreas Goerdt, Michael Krivelevich. Efficient recognition of random 
unsatisfiable k-SAT instances by spectral methods. In Proceedings 
STAGS 2001. LNCS. 

Russel Impagliazzo, Moni Naor. Efficient cryptographic schemes prov- 
ably as secure as subset sum. Journal of cryptology 9, 1996, 199-216. 
Lefteris M. Kirousis, Evangelos Kranakis, Danny Krizanc, Yiannis Sta- 
matiou. Approximating the unsatisfiability threshold of random formu- 
las. Random Structures and Algorithms 12(3), 1998, 253-269. 

A. D. Petford, Dominic Welsh. A randomised 3-colouring algorithm. 
Discrete Mathematics 74, 1989, 253-261. 

Uwe Schoning. Logic for Computer Science. Birkhauser. 

Bart Selman, David G. Mitchell, Hector J. Levesque. Generating hard 
satisfiability problems. Artihcial Intelligence 81(1-2), 1996, 17-29. 
Gilbert Strang. Linear Algebra and its Applications. Harcourt Brace 
Jovanovich, Publishers, San Diego. 1988. 




Weisfeiler-Lehman Refinement Requires at Least 
a Linear Number of Iterations 



Martin Fiirer* 



Department of Computer Science and Engineering 
Pennsylvania State University 
University Park, PA 16802, USA 
f urerScse . psu . edu, 
http: //www. cse .psu. edu/'furer 



Abstract. Let £k,m be the set of formulas of first order logic containing 
only variables from {xi, X 2 , ■ ■ ■ , Xk} and having quantifier depth at most 
m. Let Ck,m be the extension of Ck,m obtained by allowing counting 
quantifiers 3ixj, meaning that there are at least i distinct Xj's. 

It is shown that for constants ft > 1, there are pairs of graphs such 
that ft-dimensional Weisfeiler-Lehman refinement (ft-dim W-L) can 
distingnish the two graphs, but requires at least a linear number of 
iterations. Despite of this slow progress, 2ft-dim W-L only requires 
0(^/n) iterations, and 3ft — 1-dim W-L only requires O(logn) iterations. 
In terms of logic, this means that there is a c > 0 and a class of 
non-isomorphic pairs of graphs with and having 0(n) 

vertices such that the same sentences of Ch+i,cn and Ch+i,cn hold 
(ft -I- 1 variables, depth cn), even though GJ( and H!^ can already be 
distinguished by a sentence of Ck,m and thus Ck,m for some k > h and 
m = O(logn). 

Keywords: Graph Isomorphism Testing, Weisfeiler-Lehman Refine- 
ment, Games, Descriptive Complexity 



1 Introduction 

A simple and important preprocessing procedure for the graph isomorphism 
problem is the fc-dimensional Weisfeiler-Lehman refinement (fc-dim W-L). The 
algorithm tries to color fc-tuples of vertices with different colors, if they belong to 
different orbits of the automorphism group. This goal is not always achieved. If 
two fc-tuples have the same color, it is still possible that no automorphism maps 
one to the other, but the algorithm has not discovered a significant difference 
between the two ft-tuples. On the other hand, if two A:-tuples have different 
colors, then they always belong to different orbits. 

For fc = 1, this is the straightforward vertex classification algorithm where 
vertices are initially colored by their degrees. During every later refinement step, 
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each vertex is colored by the multi-set of the colors of its neighbors. The process 
stops, when no color class is split anymore. 

The case k = 2 has also been well studied. It is edge coloring. The algorithm 
starts with three classes of pairs of vertices: pairs (u, v) with or without an edge, 
and pairs (u, u). During each refinement step, every directed edge (u, v) is colored 
by the multi-set of pairs of colors on paths of length two from u to v. 

As an example, consider the path of length n — 1. Applying 1-dim W-L, the 
vertices of distance d from an endpoint receive their unique color during step 

d. The algorithm stops when every vertex “knows” its distance from its closer 
endpoint. Obviously, this requires 0{n) iterations. Using 2-dim W-L, distances 
up to 2® are measured in s steps. After only log n steps, the color of (u, u) (which 
may be interpreted as the color of vertex u) determines the distance of u from 
the closer endpoint. 

This and other examples suggest, that for fc > 1, fc-dim W-L might always run 
in just O(logn) rounds for graphs of size n. In particular, it is very suggestive to 
make this conjecture for fc = 2, because this case allows an algebraic treatment. 
Indeed, it has initiated a vast development in algebra (cellular algebras |llil| 
and coherent configurations |7]). It is easy to see that 2-dim W-L corresponds 
to squaring a matrix A of indeterminates and replacing identical expressions by 
the same new indeterminate (starting with a modified adjacency matrix where 
3 different indeterminates are used for edges, non-edges and diagonal elements). 

Assume, instead of this special “squaring” operation, one would do a sequence 
of corresponding “multiplications” by A. As there can be at most colors 
of vertex pairs, this process would stop at the latest with A" . All higher 
“powers” would be equal to this one. As a result of this reasoning, one might 
jump to the conclusion that O(logn) squaring operations were always sufficient. 
We will show in this paper that this is not at all the case. This somewhat 
counterintuitive result is possible, because the just described matrix “product” 
is not associative. 

Section 13 reviews some background information on the basic techniques con- 
necting Weisfeiler-Lehman refinement to logic and games. Section El presents 
the examples for which upper and lower bounds will be proved in Section O A 
simplified view of the pebble games is discussed in Section El 

2 The Cai-Furer-Immerman Method 

The strength of fc-dim W-L has long been an open problem. It has been difficult 
to find graphs, for which fc-dim W-L does not succeed immediately. Already 
1-dim W-L identifies random graphs in linear time For regular graphs, 1- 
dim W-L cannot even get started. But 2-dim W-L is strong enough to identify 
shortest cycles and classify the vertices by their distance from the set of vertices 
covered by shortest cycles. Refining this classification is likely to identify random 
regular graphs m in linear time. It seemed reasonable to conjecture that /(fc)- 
dim W-L could identify all degree fc graphs for some slow growing function /, 

e. g., /(fc) = fc. Cai, Fiirer, and Immerman [2| have shown that this is very far 
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from the truth. Indeed k = f2{n) is required for graphs of degree 3. We use a 
modification of their counter-examples to produce graphs which can be identified 
by /c-dim W-L, but only after a linear number of iterations. 

Cai, Fiirer, and Immerman [2| exhibit an intimate connection between three 
different approaches to the graph isomorphism problem. These approaches are 
based on Weisfeiler-Lehman refinement, descriptional complexity, and a version 
of Ehrenfeucht-Fraisse games m- 

To understand the present paper, it is required to know many definitions and 
techniques from the Cai, Fiirer, and Immerman |2 paper. We start by reviewing 
some of these notions and their applications. 



2.1 Logic Background 

Definition 1. For a given language £, the graphs G and H are C-equivalent 
{G =c H) iff the same sentences of C hold for G and H . Formally, this is 
expressed as 

G \= Lp 4=> H \= Lp . 



for all sentences (p & C. 

We say that C identifies the graph G, if G =c H implies G and H are 
isomorphic. 

We define £k to be the set of first-order formulas ip, such that the variables 
in ip are a subset of xi,X2, ■ ■ ■ , Xu- To see the full power of £fe, one has to reuse 
the same variable many times for different purposes in the same formula — a 
practice that is not very common in everyday mathematics. 

For example, consider the following sentence in £ 2 - 

Ip = 'ix\3x2(^E{xi,X2) /\^Xi[-^E{xi,X2))^ 

The sentence, ip, says that every vertex is adjacent to some vertex which is itself 
not adjacent to every vertex. Note that the first quantifier (Va;i) refers only to 
the free occurrence oi x\ within its scope. 

The language £^ is weak in expressing quantitative properties. For example, 
it is impossible to say that there are k vertices of degree k. On the other hand, 
it is possible to say that three are fc — 3 vertices of degree 2, even though it has 
to be formulated somewhat cumbersome. 

The language Ck is a natural extension of £^, enabling such statements or 
making them more elegant. For every positive integer i, Ck allows a quantifier 
(3z x) with a straightforward meaning. For example, (33 x)tp{x) means that there 
are at least 3 distinct vertices with property ip. 

As an example, the following formula in C 2 says that Xi is adjacent to at least 
two vertices of degree 7. 



(32 X2){E{xi,X2) a (37 Xi)E{xi,X2)) 
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2.2 Pebbling Games 

Let G and H be two graphs, and let m and k be natural numbers. Define the 
m-move game on G and H as follows. There are two players, and for each 
variable Xi {i = l,...,fc), there is a pair of pebbles labeled Xi. Initially, the 
pebbles lie outside the game board containing the graph. 

In each move. Player I starts by selecting an j G {1, . . . , fc} and picking up the 
pair of Xi pebbles. Then he places one of them on a vertex in one of the graphs. 
Player I is free to select pebbles that have or have not already been placed on 
the board. Player II must then place the other Xi pebble on a vertex of the other 
graph. 

To define win or loss, consider the subgraphs G' and H' of G and H induced 
by the pebbled vertices. The pebble respecting mapping / (if it exists) assigns 
the vertex of G' pebbled by xt to the vertex of H' pebbled by xt. Player II loses, 
if after some move, / does not exist or is not an isomorphism of G' and H' . 
Player I loses, if Player II plays m moves without losing. Player II has a winning 
strategy for the Lk game (without restriction) on the number of moves) if she 
can play indefinitely without losing against any strategy of Player I. 

Some authors call Player II the duplicator, because she wants the two graphs 
to look the same. They call Player I the spoiler, as he tries to interfere with this 
goal. 

Theorem 1. m Player II has a winning strategy for the Ck game on G, H iff 
H. 

A modification of the Ck games provides a combinatorial tool for analyzing 
the expressive power of Ck- The game board looks the same, and inning is defined 
as for Ck- Just as in the Ck game, the two players use k pairs of pebbles. The 
difference is that each move now has two parts. 

— Player I picks up the Xi pebble pair for some i and selects a set A of vertices 
from one of the graphs. Player II answers with a set B of vertices from the 
other graph such that |i?| = |A|. 

— Player I places one of the Xi pebbles on some vertex v G B- Player II answers 
by placing the other Xi pebble on some u G A- 

We interpret the first part of a move as an assertion of Player I that there 
exist |A| vertices in G with a certain property. Player II answers with the same 
number of such vertices in H- Player I challenges one of the vertices in B and 
Player II replies with an equivalent vertex from A- Note that it is never an 
advantage for Player I to include vertices with obviously different properties in 
A- Again, games and logic are just two sides of the same coin. 

Theorem 2. m Player II has a winning strategy for the Ck game on G, H if 
and only if G =Ck H - 
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2.3 Weisfeiler-Lehman Refinement 

One-dimensional Weisfeiler-Lehman refinement (1-dim W-L) is just vertex clas- 
sification, first by the degree and then by the multi-set of colors of the neighbors, 
until no color class is split anymore. 

For k > 1, fc-dim W-L is defined as follows. Let G be a graph and let u = 
(ui,...,Ufc) be a A:-tuple of vertices of G. The initial color W°{u) is defined 
according to the isomorphism type of u. That is, W^{u) = W^{v) iff 

ViVj {{ui,Uj) e ivi,Vj) e e) 

For each vertex w, we define 

Siftt(u, W) = {W\w, U2,U3,..., Uk-l,Uk), W\ui,W, U3,..., Uk-l,Uk), ■ ■ ■ 

. . . , VF*(ui,M2,M3, • ■ . ,W,Uk),W\ui,U2,U3, . . .,Uk-l,w)) 

Thus sift((u, v) is the fc-tuple of W*-colors of the /c-tuples of vertices obtained 
by substituting vertex w in turn for each of the k occurrences of a vertex in the 
/c-tuple u. 

At time t-l- 1, the new colors and are the same, if W*(u) = 

W*{v) and the number of re’s for which siftt(u,r(;) has any specific value is the 
same as the number of re’s for which siftt(u,w) has that same value. 

Finally W{u) is the stable color of u. It is obtained after at most iterations, 
i.e., W{u) = TT""(u). 

Building on previous work |I9I8I the following result has shown the close con- 
nection between logic, games, and Weisfeiler-Lehman refinement. Here, the for- 
mulas are allowed to have free variables, which are interpreted by the fc-tuples 
u and V respectively. 

Theorem 3. m Let G,H be a pair of eolored graphs and let (u, v) be a k- 
eonfiguration on G,H, where k>l. Then the following are equivalent: 

1. W^{u) = W^{v) for k-dim W-L 

2. G,u H,v 

3. Player LL has a winning strategy for the m-move Ck+i game on (G, H), whose 

initial eonfiguration is {u,v). 



3 An Example Where fc-Dim W-L Is Slow 

Our construction of counter-examples starts with a graph G(( (see Figure Q), 
which we call the global graph. We modify G(( to obtain 2 graphs X(G(() and 
X{G^) {“X twist of G((”) which are difficult to distinguish by fc-dim W-L. For 
the purpose of forcing k to be big, the global graph has been chosen as an 
expander |2|. For this paper, we choose the pretty simple grid graph G((. 

Now we describe how to modify G!^ to obtain A(G((). Every vertex of degree 
d of G(( is replaced by 2‘^~^ vertices, which we want to view as the four corners 
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V u 




n columns (n > k) 

Fig. 1. The global graph with fcn + 1 vertices, where fc > 1 is a constant 




Fig. 2. This figure shows a meta-vertex y and its 3 neighbors u, x, and All 4 meta- 
vertices correspond to vertices of degree 3 in G^. They are therefore represented by 
3-dimensional half-cubes. For each meta-vertex, only the 4 dark points are vertices. 
The 4 white points and the dashed lines are just there to illustrate the cubes. Note the 
3 different types of connections of y to its neighbors. The connections to u are left to 
left and right to right. The connections to x are front to front and back to back. The 
connections to 2 are top to top and bottom to bottom. A top to left and bottom to right 
connection would also be fine, as long as every vertex of degree 3 is represented by a 
meta- vertex whose connections represent the 3 basic partitions: left-right, top-bottom, 
and front-back. 
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of a d-dimensional cube with an even number of coordinates being 1. We refer 
to these vertices as a half-cube or meta- vertex (see Figure EJ- 

We might denote the vertices of the half-cube at a vertex v of by u(0, 0, 0), 
u(0, 1, 1), u(l, 0, 1), u(l, 1, 0). If two vertices u and v of are adjacent, then their 
half-cubes are connected as follows. Say, u is of degree 4 and and {u, v} is the 
third edge of u, and v is of degree 3 and {m, u} is the first edge of v. Then, for 
all S {0,1}, the vertex ^,h) is adjacent to 

(provided these vertices exist, i.e., the sum of their coordinates is even). 

X{G^) is constructed almost exactly as X{G^) with one exception. We say 
one edge of is twisted. 

Definition 2. To twist an edge {m, of the global graph Gj( means to replace 
every edge between the meta-vertex u and the meta-vertex v by a non-edge, and 
every non-edge by an edge. 

It is not difficult to see that X{G^ and X(G(j) are not isomorphic. We cannot 
make the twist disappear, but we can move it around to any edge of the connected 
global graph G((. For example, mapping u(*i, Z2, *3, *4) to u(l — ii,J2)*3; 1 ~ *4) 
moves a twist from a the first edge of u to its fourth edge. 

4 The Global Game 

The graphs X{G^) and X(G(() are nicely structured. Nevertheless it is somewhat 
complicated to analyze the games played on them. Therefore, we investigate a 
simpler game Qk that can still adequately describe the original game Ck- The new 
game is played on the global graph G(j rather than the pair (X(G((), X(G(()). 
We therefore call it the global game. 

The moves of Player I are very much the same as before. He picks up one of 
his k pebbles and puts it on a vertex of G(j . The moves of Player II are of a very 
different kind. To describe them, we introduce the following notion of connected 
components of edges in G((. 

Definition 3. The edges e, e' are connected if there is a path vq,Vi, . . . , w^_i, Vf 
in G(( with e — {vo,vi), e! = (vi-i,vi), and none of the interior vertices 
V\,V 2 , ■ ■ ■ ,V(,-i is holding a pebble. 

We just use the term component when we mean a connected component of edges. 

A move of Player II just consists of declaring certain components as twisted. 
The game Gk starts with no pebbles on the board, and the only component 
being twisted. At any time, the number of twisted components is odd. When 
Player I picks up a pebble from the board, two or more components might 
merge into one component. The new component is twisted iff the number of 
merged twisted components was odd. When Player I places a pebble on the 
board, then one component might be replaced by two or more newly formed 
components. Player II declares the new components as twisted or straight, with 
the only restriction that the parity of the number of twisted components does 
not change. When a move of Player I does not split any component, then the 
answer of Player II consists of doing nothing. 
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Definition 4. If a twisted component has size 1, then we say the twist is 
trapped. 

Player II loses the global game Gk as soon as any twist is trapped. Player II 
wins the m-move game, if she can do m moves without losing. 

Intuitively, the original game Ck and the new global game Gk are equivalent, 
because of the following reasoning. 

— Player I does not really have a reason to place a pebble on any node other 
than the origin u(0, . . . , 0) of a meta-vertex u. So we might just view him as 
placing the pebble on the meta-vertex (or the corresponding vertex of the 
global graph G^). 

— Unless risking inevitable defeat, Player II better place her pebble on the 
corresponding meta-vertex. Thus, no selection of a meta-vertex has to be 
done by Player II. She just selects among the vertices of the given meta- 
vertex. She does this by selecting twists to move her choice on u into the 
origin of u. 

— Here, we only consider graphs Gjj without any non-trivial automorphisms. 
Furthermore, every vertex can easily be identified. Therefore, the global game 
can be played £-like rather than C-like. No player makes any claims about 
the existence of more than one vertex with certain properties. 

In summary, we are not claiming that every play of the original game Ck could 
be simulated by the global game Gk, but we will show that it is of no significant 
disadvantage for a player to play in a way that can be so simulated. 

Definition 5. Player II plays proper in the game Ck, if after any of her moves, 
it is possible to apply an odd number of twists to X{G^ such that there is a 
pebble respecting isomorphism between X(G^) and the modified graph X{G'f). 

In particular, if Player II plays proper, then she answers every move by a 
move in the corresponding meta-vertex. Likewise, she answers any set of potential 
moves by a set of potential moves in corresponding met a- vertices. She is further 
restricted in placing a pebble within a meta-vertex, should Player I place more 
than one pebble on the same meta-vertex. 

Our graphs have the property that there is a unique vertex u of degree 
1, distinguishing its neighbor v, and the unique vertex w of degree 2 at distance 
h from u. All the other vertices are characterized by their distances from v and 
w. 

Lemma 1. Let the number of pebbles be at least 3. If at any time. Player II 
does not play proper in Ck, then Player I can force a win in O(logn) additional 
moves. 

Proof. The unique characterization of the vertices in G(j implies that some dis- 
tance is wrong, whenever Player II selects a non-matching meta-vertex. With 3 
pebbles. Player I can easily exhibit the shorter distance in O(logn) moves by 
a divide-and-conquer approach. Hereby, Player I might have a need to identify 
the vertices u or w of G(j. As these vertices are (partly) characterized by their 
degrees. Player I will use the full power of C^-moves as follows. When Player II 
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matches a low degree vertex by a high degree vertex, then Player I proposes 
the set of neighbors of the high degree vertex, and Player II has no appropriate 
answer. 

Assume now that Player II has always played in the correct meta-vertices, 
but no set of twists can produce a pebble respecting isomorphism. Then it is not 
hard to see that there has to be an inconsistency within a meta- vertex containing 
multiple pebbles. E.g., X{G'^) might have 2 pebbles in the front of the half-cube, 
while A(GJJ) has one of the corresponding pebbles in the front and one in the 
back. By selecting in that neighboring meta-vertex which distinguishes front 
from back. Player I wins in one move. 

□ 

As it does not pay off for Player II to play improper, we can focus now on 
the case where Player II always plays proper. 

Theorem 4. Assume Player II is restricted to play proper in the game Ck on 
the pair (A(Gjj), A(G^)). Then a player has a strategy to win the m-move Ck 
game on the pair (A(G^), A(Gj^)) if and only if that player has a strategy to 
win the m-move global Qk-game. 

Proof. We have to prove four parts. 

(a) Player I wins the m-move Ck game on the pair {X {G'^) , X {G!f)) . In the 
simulating global game Gk, Player I has only to be specific about the selection of 
meta-vertices, but not about his choice within any meta-vertex, while Player II 
still shows her complete selection. Thus Player I can follow his old winning 
strategy. When Player I wins the simulated game, some pair of pebble is adjacent 
in one copy, but not adjacent in the other one. These two pairs correspond to a 
trapped twist in Gjj, indicating a win in the simulating game too. 



(b) Player I wins the m-move Gk game on the pair (A(Gjj), A(Gjj)). In the 
simulating game Ck^ Player I has to make choices within meta- vertices. He always 
chooses the origin. A trapped twist in the global game Gk corresponds to an edge 
vs. non-edge pair in the simulating game implying a win too. 



(c) Player II wins the m-move Ck game on the pair (A(G^), W(Gjj)). As 
Player II is restricted to proper plays, there is always a placement of twists 
onto the edges such that her moves are exactly matching the moves of Player I. 
The placements of twists on edges determine a unique parity of twists in each 
component, producing the simulating move of Player II. The simulated move 
produces no conflict if the simulated move did not. 



(d) Player II wins the m-move Gk game on the pair (A(G^), A(G^)). The moves 
of Player II in the t/fc-game really describe her strategy to reply to any move of 
Player I on the same meta-vertex. Player II just follows this strategy. □ 




Weisfeiler- Lehman Refinement 



331 



5 Upper and Lower Bounds 

Theorem 5. The number of moves sufficient for Player I to win the game Ck 
varies as follows depending on the number of pebbles. 

(a) Player I has a winning strategy in the C^h game on the pair {X (G^) , X {G'f)) 
in O(logn) moves. 

(b) Player I has a winning strategy in the C 2 h+i game on the pair 
{X (G’f) , X (G!f)) in 0{y/n) moves. 

(c) Player I has a winning strategy in the Ch+i game on the pair 
{X{Gl),X{Gl)) in 0{n) moves. 

Proof. It is sufficient to consider the corresponding Qk game. We say that Player I 
builds a wall if he places pebbles on all vertices of a cut disconnection the leftmost 
column form the rightmost (full) column. For example the vertices of one column 
of form a wall. 

(a) Having enough pebbles to produce 3 walls in Player I can employ a 
divide-and-conquer strategy. The pebbles of one wall have only to be removed 
when the twist is captured between the other 2 walls. 

(b) Player I builds a new wall at distance ^/n from the previous wall starting 
in the middle and moving towards the twist. As soon as a wall is built that 
keeps the twist away from the other one, the old wall is no longer needed 
and its pebbles can be reused. If the twist is located between two walls, then 
Player I moves one of them slowly inside using the additional pebble. 

(c) Player I builds a wall anywhere (best in the middle). Then move it slowly 

towards the side containing the twist. □ 

Note that Player I can win the Qk game on G() by a particularly simple 
winning strategy. He can build a wall on the left hand side and move it towards 
the right hand side, step by step decreasing the size of the component containing 
the twist. All moves of Player I are independent of the moves of Player II. 

Theorem 6. For k < h, Player II has a winning strategy in the Ck game on the 
pair of graphs {X (G’f) , X (G'f)) . 

Proof. We may look at the corresponding Qk game. Even for k = h, Player I 
has just enough pebbles to build a wall, but in the next move he has to break 
it down again. Player II easily maintains a single twist, always in the largest 
component. □ 

Corollary 1 . h — 1-dim W-L cannot detect a difference between the graphs 
A(G0) andXiGf). 

Corollary 2. X(G’f) and X{G^) agree on all formulas ofCh. 

Definition 6. The size of a component in Gj( is the number of empty columns 
in it. A component is good if its size is positive. 
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Theorem 7. For k < 2h, every winning strategy of Player I in the Gk game on 
requires at least fl{n) moves. 

Proof. Let us start with the trivial observation that in Gjj there are h vertex- 
disjoint paths between any pair of distinct good components. Thus there is a 
wall consisting of at least h pebbled vertices between these components. Thus 
with at most 2h pebbles, there are at any time at most 3 components. 

We now want to describe a strategy for Player II, that sufficiently delays 
a win of Player I. In this strategy. Player II always maintains just a single 
twist. Assume that one good component C\ of size si exists, and another good 
component containing the twist is just split into into good component C2, C3 
with sizes S2 and S 3 respectively. Let C2 be the component between Ci and C 3 . 
Then Player II puts the twist into C 3 if si -I- S 2 < S 3 , and otherwise into C 2 . 
The following removal of any pebble by Player I breaks a wall, again producing 

2 components with the twist being in a component of size at least S 3 . 

When two good components are formed after m' moves, the twist is in the 
larger component of size at least (n — m') /2. After m moves, the twist is usually 
in a component of size at least {n — m')/2—{m — m') = {n+m') /2 — m > n/2 — m. 
There is an exception for the isolated times, when 3 components exist, in which 
case n /2 — m is a lower bound on the sum of the sizes of the middle and any 
outer component. Player II does not lose before the twist is in a bad component 
(of size 0). Thus the number of moves is at least n/2 = I7(n). □ 

Corollary 3. For k < 2h, every winning strategy of Player I in the Ck game on 
the pair (A(G(j), X(G(j)) requires at least Q{n) moves. □ 

Theorem 8. For k <2h+l, every winning strategy of Player I in the Gk game 
on Gjj requires at least ^7{^/n) moves. 

Proof. As in the proof of Theorem 0 there are at most 3 good components 
at any time. When 2 good components are formed for the first time, a good 
strategy for Player II is to move the twist into the larger one. When 3 good 
component Gi, G2, G3 (with Si = size of Ci) are formed, she has a choice between 
say G2 and G3 where C2 is between C\ and G3. She chooses C2 if S2 > \/n 
and S 3 < n/2 — k. (This selection could be slightly better optimized without 
improving the Theorem.) Consider the integer r defined by 

r = min(si -|- S 2 , S 3 -|- S 2 , S 2 \/n) 

if there are 3 good components, and the twist is in G 2 . If there are less than 

3 good components, then r is defined to be the size of the larger or only good 

component. When two components are formed form one, then r gets a value of 
at least [n — k) /2. Once the value of r is less than n/2 — k, it can never decrease 
by more than ^/n in a single move. This can be shown by case analysis, where 
the only interesting case is going form 2 good components to 3. The f2{n) lower 
bound follows immediately. □ 

Corollary 4. For k < 2h+l, every winning strategy of Player I in the Ck game 
on the pair (A(G(j), A(G(j)) requires at least moves. □ 
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A recent result of Grohe 0 says that determining whether two graphs are 
Ck+i equivalent, and thus whether they can be distinguished by fc-dimensional 
Weisfeiler-Lehman refinement, is P-complete. Grohe shows the same result for 
Ck+i equivalence too. This does not imply, but certainly strongly suggests that 
/c-dimensional Weisfeiler-Lehman refinement is slow. Indeed the method of Grohe 
could also be used to prove Theorem[71 It seems that such a proof would be much 
more complicated than the proof given in this paper. 

Acknowledgment. I want to thank Luitpold Babel for an email conversation in 
1994 on some results that implicitly assumed associativity of the multiplication 
in coherent algebras. This has caused me to discover the main result of this 
paper. 
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Abstract. We continue the investigation of interactive proofs with 
bounded communication, as initiated by Goldreich and Hastad (IPL 
1998). Let L be a language that has an interactive proof in which the 
prover sends few (say b) bits to the verifier. We prove that the comple- 
ment L has a constant-round interactive proof of complexity that depends 
only exponentially on b. This provides the first evidence that for NP- 
complete languages, we cannot expect interactive provers to be much 
more “laconic” than the standard NP proof. 

When the proof system is further restricted (e.g., when t> = 1, or when 
we have perfect completeness), we get significantly better upper bounds 
on the complexity of L. 

Keywords: interactive proofs, Arthur-Merlin games, sampling proto- 
cols, statistical zero knowledge, game theory 

1 Introduction 

Interactive proof systems were introduce by Goldwasser, Micali and Rack- 
off fGMR,89j in order to capture the most general way in which one party can effi- 
ciently verify claims made by another, more powerful party0 That is, interactive 
proof systems are two-party randomized protocols through which a computation- 
ally unbounded prover can convince a probabilistic polynomial-time verifier of 
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as general interactive proofs HSHS|. We warn that the latter assertion refers to the 
entire class but not to refined complexity measures such as the number of bits sent 
by the prover (considered below). 
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the membership of a common input in a predetermined language. Thus, interac- 
tive proof systems generalize and contain as a special case the traditional “NP- 
proof systems” (in which verification is deterministic and “non-interactive” ) . 

It is well-known that this generalization buys us a lot: The IP Characteri- 
zation Theorem of Lund, Fortnow, Karloff, Nisan and Shamir [LFKIN 92ISha92| 
states that every language in PSPACE has an interactive proof system, and it 
is easy to see that only languages in PSPACE have interactive proof systems. 

It is well-known that the strong expressive power of interactive proofs is 
largely due to the presence of interaction. In particular, interactive proofs in 
which a single message is sent (like in NP-proofs) yield a complexity class (known 
as MA) that seems very close to NP. It is interesting to explore what happens 
between these extremes of unbounded interaction and no interaction. That is, 
what is the expressive power of interactive proofs that utilize a bounded, but 
nonzero, amount of interaction? 



Interactive Proofs with Few Messages. The earliest investigations of the above 
question examined the message complexity of interactive proofs, i.e., the number 
of messages exchanged. (Sometimes, we refer to rounds, which are a pair of 
verifier-prover messages.) The Speedup Theorem of Babai and Moran [BMiS8| 
(together with |(IS8h| l shows that the number of messages in an interactive proof 
can be always be reduced by a constant factor (provided the number of messages 
remains at least 2). On the other hand, there is a large gap between constant- 
round interactive proofs and unrestricted interactive proofs. As mentioned above, 
all of PSPACE has a general interactive proof [I T'K l\ID2pSha,92) . In contrast, 
the class AM of problems with constant-round interactive proofs is viewed as 
being relatively close to NP. Specifically, AM lies in the second level of the 
polynomial-time hierarchy |RM88| . cannot contain coNP unless the polynomial- 
time hierarchy collapses [BHZ87) . and actually equals NP under plausible circuit 
complexity assumptions 



Laconic Provers. A more refined investigation of the above question was initi- 
ated by Goldreich and Hastad innn, who gave bounds on the complexity of 
languages possessing interactive proofs with various restrictions on the number 
of bits of communication and/or randomness used. One of the restrictions they 
considered, and the main focus of our investigation, limits the number of bits 
sent from the prover to the verifier by some bound b. That is, what languages 
can be proven by “laconic” provers? 

Since the prover is trying to convey something to the verifier, this seems to 
be the most interesting direction of communication. Moreover, for applications 
of interactive proofs {e.g., in cryptographic protocols), it models the common 
situation in which communication is more expensive in one direction {e.g., if the 
prover is a handheld wireless device). 

On one hand, we know of interactive proofs for several “hard” 
problems (Quadratic Nonresiduosity Graph Nonisomor- 

phism EMMU, and others |GK93IGG00ISV97j l in which the communication 
from the prover to the verifier is severely bounded (in fact, to one bit). On the 
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other hand, no such proof systems were known for NP-complete problems, nor 
was there any indication of impossibility (except when additional constraints are 
imposed IdHhHp . In this work, we provide strong evidence of impossibility. 



Our Results. Consider interactive proofs in which the prover sends at most 
b = h{n) bits to the verifier on inputs of length n. Goldreich and Hastad fcrnm 
Thm. 4] placed such languages in BPTIME'^^(T), where T = poly(n) 
which clearly implies nothing for languages in NP. In contrast, we show that 
the complements of such languages have constant-round interactive proofs of 
complexity T [i.e., the verifier’s computation time and the total communication 
is bounded by T). In particular, NP-complete problems cannot have interactive 
proofs in which in which the prover sends at most polylogarithmically many bits 
to the verifier unless coNP is in the quasipolynomial analogue of AM. In fact, 
assuming NP has constant-round interactive proofs with logarithmic prover-to- 
verifier communication we conclude coNP C AM. As mentioned above, this is 
highly unlikely. 

We obtain stronger results in two special cases: 

1 . We show that if a language has an interactive proof of perfect completeness 
{i.e., , zero error probability on yes instances) in which the prover sends at 
most b{n) bits, then it is in coNTIME(T), where T{n) = 2^^"^ • poly(n). 
Thus, unless NP = coNP, NP-complete languages cannot have interactive 
proof systems of perfect completeness in which the prover sends at most 
logarithmically many bits. 

2. We show that if a language has an interactive proof in which the prover 
sends a single bit (with some restrictions on the error probabilities), then 
it has a statistical zero-knowledge interactive proof; that is, is in the class 
SZK. This is a stronger conclusion than our main result because SZK C 
AM n coAM, as shown by Fortnow jFor89) and Aiello and Hastad [AH9Ij . 
Recalling that Sahai and Vadhan jS V 97j showed that any language in SZK 
has an interactive proof in which the prover sends a single bit, we obtain a 
surprising equivalence between these two classes H 



Lastly, we mention one easy, but apparently new, observation regarding mes- 
sage complexity. A question that is left open by the results mentioned earlier 
is what happens “in between” constant rounds and polynomially many rounds. 
Phrased differently, can the Speedup Theorem of Babai and Moran be improved 
to show that m(n)-message interactive proofs are no more powerful than m'{n)~ 
message ones for some m' = o{m)l By combining careful parameterizations of 
[LFKN92ll3M88j . we observe that such an improvement is unlikely. More pre- 
cisely, for every nice function m, we show that there is a language which has an 
m(n)-message interactive proof but not an o(m(n) )-message one, provided that 
#SAT is not contained in the subexponential analogue of coAM. 

^ In addition, if the error probabilities are sufficiently small, we also are able to reduce 
interactive proofs in which the prover sends a single message of several bits {e.g., 
0(loglog n) bits) to the 1-bit case above. But we omit these results from this extended 
abstract due to space constraints. 
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Additional Related Work. It should be noted that the results of Goldreich and 
Hastad are significantly stronger when further restrictions are imposed in addi- 
tion to making the prover laconic. In particular, they obtain an upper bound 
of BPTIME(T) (rather than BPTIME’^^(T)), with T = • poly(n) 

for languages possessing either of the following kinds of interactive proofs: (a) 
public-coin proofs in which the prover sends at most b bits, (b) proofs in which 
the communication in both directions is bounded by b. 

There has also been a body of research on the expressive power of multi-prover 
interactive proofs (MIP’s) and probabilistically checkable proofs (PGP’s) with low 
communication, because of the importance of the communication parameter in 
their applications to inapproximability. In particular, Bellare, Goldreich, and 
Sudan IIjGS98l give negative results about the expressive power of “laconic” 
PGP’s and MIP’s. One-query probabilistically checkable proofs are equivalent 
to interactive proofs in which the prover sends a single message, so our results 
provide bounds on the former. 



Our work is also related to work on knowledge complexity. Knowledge com- 
plexity, proposed by IGMK89I . aims to measure how much “knowledge” is leaked 
from the prover to the verifier in an interactive proof. Several measures of knowl- 
edge complexity were proposed by Goldreich and Petrank and series of 

works provided upper bounds on the complexity of languages having interactive 
proofs with low knowledge complexity [GP99|G()P98fP r9tifSVT?7] . These results 
are related to, but incomparable to ours. 



For example, Petrank and Tardos showed that languages having 

knowledge complexity k = O(logn) are contained in AM fl coAM. While it 
is true that the knowledge complexity of an interactive proof is bounded by the 
amount of prover-to- verifier communication, their result does not yield anything 
interesting for laconic interactive proofs. The reason is that their result only 
applies to interactive proofs with error probabilities significantly smaller than 
2“^, and it is easy to see that interactive proofs with prover-to- verifier commu- 
nication k = O(logn) error probability <C 2“^ only capture BPP (and hence 
are uninteresting). Our results apply even for constant error probabilities. 

Sahai and Vadhan |SV97| (improving |GP99j l showed that languages with 
logarithmic knowledge complexity in the “hint sense” collapse to SZK, and 
their result applies even if the error probabilities are constant. However, this is 
also incomparable to ours, for the “hint sense” is the one measure of knowledge 
complexity which is not bounded by the prover-to-verifier communication. 



Finally, it is important to note that the situation is dramatically different 
for argument systems (also known as computationally sound proofs). 

These are like interactive proofs, but the soundness condition is restricted to 
polynomial-time provers. Kilian [Kil92j showed that NP has laconic argument 
systems if strong collision-resistant hash functions exist. Specifically, under a 
strong enough (but still plausible) assumption, NP has public-coin arguments 
in which the verifier’s randomness and the communication in both directions is 
poly logarithmic. Gombined with [KIH98j . this provides a strong separation be- 
tween the efficiency of arguments versus interactive proofs for NP; and our 
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results extend this separation to the case that only the prover-to-verifier com- 
munication is counted (and the interactive proof is not required to be public 
coin). 

2 Preliminaries 

We assume that the reader is familiar with the basic concepts underlying inter- 
active proofs (and public-coin interactive proofs) (see e.g., |Sipil7l(lolhhlV^d00j l. 
Throughout, we work with interactive proofs for promise problems rather than 
languages. More precisely, a promise problem U = is a pair of dis- 

joint sets of strings, corresponding to yes and NO instances, respectively. In other 
words, a promise problem is simply a decision problem in which some inputs are 
excluded. The definition of interactive proofs is extended to promise problems 
in the natural way: we require that when the input is a yes instance, the prover 
convinces the verifier to accept with high probability (completeness); and when 
the input is a NO instance, the verifier accepts with low probability no matter 
what strategy the prover follows (soundness). Working with promise problems 
rather than languages only makes our results stronger (except for one direction 
of Theorem lOll . 

We denote by IP(5, m) (resp., AM(&, to)) the class of problems having in- 
teractive proofs (resp., public-coin interactive proofs) in which the prover sends 
a total of at most b bits, and the total number of messages exchanged (in both 
directions) is at most to. Note that b and to are integer functions of the common 
input length, denoted n. When b is not polynomial in n, it will be understood 
that we talk of a generalization in which the verifier is allowed time polynomial 
in b and n (rather than just in n). Unless specified differently, we refer to proof 
systems with completeness probability 2/3 and soundness probability 1/3. 

We denote IP(&) = IP(6, 26); that is, making only the trivial bound on the 
number of messages exchanged. We denote by IP^ the analogue of IP when 
the proof system has perfect completeness (he., completeness probability 1). 

(d.0f 

The class of problems with constant-round interactive proofs is denoted AM = 
AM(poly(n), 2) = IP(poly(n), 0(1)). (The second equality is by Thms|S|and 
O below.) When we wish to specify the completeness probability c = c(n) and 
soundness probability s = s(n) we will use subscripts: IPc,s and AMc,s. 

Using the above notations, we recall the main results of Goldreich and 
Hastad, which are the starting point for our work. 

Theorem 2.1 ( |GH98j 1. AM(6 , to) C BPTIME(poly(2 ^, to™, n)) 

Theorem 2.2 ([GH98])- IP(6,m) C BPTIME(poly(2 ^ to™, n))^^ 

We also state some standard results that we will use: 

Theorem 2.3 (|| BM88] ). AM(6, to) C AM(6^-poly(TO.), |"to/2]) C AM((6- 

to )°(™), 2 ). 



Theorem 2.4 ( |GS89j j. IP(6 , to) C AM(poly(6, n), to). 
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Theorem 2.5 ( |BHZ87J h // coNP C AM(&,2), then S 2 C Il 2 (poly(n, 6)). 

In particular, if coNP C AM, then the polynomial-time hierarchy collapses to 

PH = S 2 = n2. 

Above and throughout the paper, Si(t(n)) (resp., IIi(t(n))) denotes the 
class of problems accepted by t(n)-time alternating Turing machines with i 
alternations beginning with an existential (resp., universal) quantifier. Thus, 

def def 

Si = Si(poly(n)) and Hi = IIi(poly(n)) comprise the Pth level of the polynomial- 
time hierarchy. 

We will also consider SZK, the class of problems possessing statistical zero- 
knowledge interactive proofs. Rather than review the definition here, we will 
instead use a recent characterization of it in terms of complete problems which 
will suffice for our purposes. For distributions X and Y, let A{X, Y) denote their 
statistical difference (or variation distance, i.e., A{X, Y) = maxg | Pr [A G 5"] — 
Pr [T G 5”] I . We will consider distributions specified by circuits which sample 
from them. More precisely, a circuit with m input gates and n output gates can 
be viewed as a sampling algorithm for the distribution on {0,1}" induced by 
evaluating the circuit on m random input bits. Statistical Difference is the 
promise problem SD = (SDy,SDjv), where 

SDy = {(A,T) : A{X,Y)>2/3} 

SDn = {{X,Y) : A{X,Y) < 1/3}, 

where X and Y are probability distributions specified by circuits which sample 
from them. More generally, for any 1 > a > /3 > 0, we will consider variants 
SD“’^, where the thresholds of 2/3 and 1/3 are replaced with a and (3 respec- 
tively. 

Theorem 2.6 ( [SV97J L For any constants 1 > > /3 > 0, SD“’^ is complete 

for SZK. 

The following results about SZK are also relevant to us. 

Theorem 2.7 ( |ror89 AH91J 1. SZK C AM n coAM. 

Theorem 2.8 ( |OkaOOj b SZK is closed under complement. 

Theorem 2.9 ( |SV97 |). SZK C IPi_ 2 —, 1 / 2 ( 1 )- 

3 Formal Statement of Results 

We improve over Theorem f2.2L and address most of the open problems suggested 
in Sec. 3]. Our main results are listed below. 

For one bit of prover-to- verifier communication, we obtain a collapse to SZK. 

Theorem 3.1. For every pair of constants c, s such that 1 > > s > c/2 > 0, 

IPe,s(l) = SZK. 

With Theorem rz.Sl this gives: 
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Corollary 3.2. For every c,s as in Thm. \S. 1\ IPc,s(l) is closed under comple- 
ment. 

For more rounds of communication, we first obtain the following result for 
interactive proofs with perfect completeness (denoted by IP^): 

Theorem 3.3. IP’^(6) C coNTIME(2*' • poly(n)). In particular, 
IP+(0(logn)) C coNP. 

In the general case {i.e., with imperfect completeness), we prove: 

Theorem 3.4. IP(&, to) C coAM(2^ • poly(TO’”, n), 0 (to)). In particular, 
IP(0(log n),m) C coAM(poly(n), 0 (to)), for to = 0(logn/loglogn), 

The above theorems provide first evidence that NP-complete problems can- 
not have interactive proof systems in which the prover sends very few bits. 
Further evidence toward this claim is obtained by applying Theorems 12.31 and 

Corollary 3.5. IP(6, to) C coAM(poly(2^, to™, n)™, 2). In particular, 
IP(0(logn), 0(1)) C coAM and IP(polylogn) C coAM. 

Corollary 3.6. NP g IP(0(log n),0(l)) unless the polynomial-time hierarchy 
collapses (to S 2 = 112^. NP g IP(polylogn) unless S 2 C II 2 . 

Above, coAM and II 2 denote the quasipolynomial-time ( 2 P°biog’i’) analogues 
of coAM and II 2 . 

Finally, we state our result on message complexity. 

Theorem 3.7. Let m(n) < n/logn be any “nice” growing function. Then 
AM(poly(n), TO(n)) g AM(poly(n), o(to(7i)) unless #SAT S AM(2°(”\2). 

Note that, by Theorem 12.41 it is irrelevant whether we use IP or AM in this 
theorem. 

Due to space constraints, we only present proofs of Theorems rm and roi in 
this extended abstract. The proof of our main result f Theorem 13.411 is signifi- 
cantly more involved, and will be given in the full version of the paper. 

4 Extremely Laconic Provers (Saying Only One Bit) 

In this section, we prove Theorem It. II The proof is based on the following lemma, 
along with previous results. 

Lemma 4.1. Every problem in IPc,s(l) reduces to 

Proof. Let (P,V) be an interactive proof for some problem so that the prover 
sends a single bit during the entire interaction. We may thus assume that on 
input X and internal coin tosses r, the verifier first sends a message y = 14, (r), 
the prover answers with a bit a € {0,1}, and the verifier decides whether to 
accept or reject by evaluating the predicate 14(r, cr) G {0,1}. 
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A special case — unique answers. To demonstrate the main idea, we consider 
first the natural case in which for every pair (x, r) there exists exactly one a such 
that Vx(r,a) = 1. (Note that otherwise, the interaction on input x and verifier’s 
internal coin tosses r is redundant, since the verifier’s final decision is unaffected 
by it.) For this special case (which we refer to as unique answers), we will prove 
the following: 

Claim 4.2. If a problem has an IPc,s(l) proof system with unique answers, then 
it reduces to 

Let ax(r) denote the unique a satisfying 14(r, cr) = 1. The prover’s ability to 
convince the verifier is related to the amount of information regarding ax (r) that 
is revealed by Vx{r). For example, if for some x, ax{r) is determined by Vx{r) then 
the prover can convince the verifier to accept x with probability 1 (by replying 
with ax{r)). If, on the other hand, for some x, ax{r) is statistically independent 
of Vx(r) (and unbiased), then there is no way for the prover to convince the 
verifier to accept x with probability higher than 1/2. This suggests the reduction 

X where C^(r) (Vx(r),ax(r)) and C^(r) (14(r), oT(r)), where 

b denotes the complement of a bit b. 

Now we relate the statistical difference between the distributions sampled by 
Cx and Cf to the maximum acceptance probability of the verifier. Since the first 
components of and Cf are distributed identically, their statistical difference 
is exactly the average over the first component Vx (r) of the statistical difference 
between the second components conditioned on V/(r). That is, 

E [A{ax\y,af\y)], 

V*-Vx 

where ax\y denotes the distribution of ax(r) when r is uniformly distributed 
among {r' : 14 (r') = y}. For any y and 6 € {0, 1}, let denote the probability 
that ax\y = b. Then, for any fixed y, A{ax\y,Ox\y) = \qi\y - qo\y\ = 2qy - 1, 

where qy = max5g{o.i}{96|!/} 4 So, we have: 

^{ClCl)= E [2qy-l]. 

On the other hand, the optimal prover strategy in (P, V) is: upon receiving y, 
respond with b that maximizes When the prover follows this strategy, we 
have 

Pr[E accepts x] = E [qy] ■ 

Putting the last two equations together, we conclude that Z\(C/,C'^) = 2 • 
Pr[P accepts x] — Thus if the proof system has completeness and sound- 
ness error bounds c and s, respectively, then the reduction maps instances to 

® Note that under the hypothesis of the special case, for every x the prover may 
convince the verifier to accept x with probability at least 1/2 (and so such a non- 
trivial proof system must have soundness at least 1/2). 
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pairs having distance bounds 2c — 1 and 2s — 1, respectively^ This establishes 
Claim 14.21 

The general case. We now proceed to deal with the general case in which there 
may exist pairs (a;, r) so that either both cr’s or none of them satisfy T4(r, a) = 1. 
We do so by reducing this general case to the special case. 

Claim 4.3. If a problem is in IPc,s(l), then it has an IP(i+c)/ 2 ,(i+s)/ 2 (l) proof 
system with unique answers. 

Clearly, the lemma follows from this claim and the previous one, so we pro- 
ceed to prove the claim. 

Proof of claim. Let (P, V) be a general IPc,s proof system. Consider 
the following modified verifier strategy. 

V'{x)i Generate coin tosses r for the original verifier and do one of the 
following based on the number j of possible prover responses cr for 
which Vx{r,a) = 1. 

[j = 2] Send the prover a special message “respond with 1” and 
accept if the prover responds with 1. 

[j = 1] Randomly do one of the following (each with prob. 1/2): 

— Send the prover y = (r) and accept if the prover responds 

with the unique a such that Vx(r,a) = 1. 

— Send the prover a special message “respond with 1” and 
accept if the prover responds with 1. 

[j = 0] Choose a random bit a. Send the prover a special message 
“guess my bit” and accept if the prover responds with a. 

Clearly, V' has unique answers. It can be shown that if an optimal 
prover makes V accept with probability S, then an optimal prover makes 
V' accept with probability (1 -I- ^)/2. Claim follows. □ 

■ 

Theorem 13. 1 i follows from Tjemma Pm Theorem r2.bl and Theorem 12. bl Details 
will be given in the full version of the paper. The > s constraint in Theorem 13.1 1 
is due to the analogous constraint in Theorem El Indeed, we can establish the 
following equivalence (also to be proven the full version of the paper): 
Theorem 4.4. The following are equivalent. 

1. For every a, 13 such that 1 > a > f3 > 0, SD“’^ is in SZK (and is therefore 
also complete). 

2. For every c, s such that 1 > c > s > c/2 > 0, IPc,s(l) = SZK. 

Finally, we remark that the condition s > cj2 in Theorems 13.1 1 a.nd 14.41 is 
necessary, for IPc,s(l) = BPP for any s < c/2. 

Note that this relationship is reversed by the natural IP(1) system for SD“’^ in which 
the verifier selects at random a single sample from one of the two distributions and 
asks the prover to guess which of the distributions this sample came from. If the 
distributions are at distance 5 then the prover succeeds with probability ^ + §. Thus 
applying this proof system to gp) 2 c-i, 2 s-i obtain completeness and soundness 
bounds c and s, respectively. 
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5 Laconic Provers with Perfect Completeness 

In this section, we prove Theorem 13.31 

Theorem, (restated): If a problem II has an interaetive proof system with 
perfect completeness in which the prover-to-verifier communication is at most 
b{-) bits then II € coNTIME(2^^”) • poly(n)). 

Proof. We take a slightly unusual look at the interactive proof system for 77, 
viewing it as a “progressively finite game” between two players P* and V*. P* 
corresponds to the usual prover strategy and its aim is to make the original 
verifier accept the common input. V* is a “cheating verifier” and its aim is to 
produce an interaction that looks legal and still makes the original verifier reject 
the common input. 

To make this precise, let b = b(n) be the bound on the prover-to-verifier 
communication in (P, V) on inputs of length n, and let m = m{n) be the number 
of messages exchanged. Without loss of generality, we may assume that the V 
sends all its coin tosses in the last message. A transcript is a sequence of m 
strings, corresponding to (possible) messages exchanged between P and V. We 
call a transcript t consistent (for x) if every verifier message in t is the message 
V would have sent given input x, the previous messages in t, and the coin tosses 
specified by the last message in t. We call a consistent t rejecting if V would 
reject at the end of such an interaction. 

Now, the game between P* and Vf has the same structure as the interaction 
between P and V on input x\ a total of m messages are exchanged and P* is 
allowed to send at most b bits. The game between Pf and Vf yields a transcript t. 
We say that Vf wins if t is consistent and rejecting, and that P* wins otherwise. 
We stress that Vf need not emulate the original verifier nor is it necessarily 
implemented in probabilistic polynomial time. 

This constitutes a “perfect information finite game in extensive form” (also 
known as a “progressively finite game”) and Zermelo’s Theorem (c/., [fruc95l 
Sec 10.2]) says that exactly one of the two players has a winning strategy — that 
is, a (deterministic) strategy that will guarantee its victory no matter how the 
other party plays. 

Using the perfect completeness condition, we infer that if the common input 
a; is a yes instance then there exists a winning strategy for P* . (This is because 
the optimal prover for the original interactive proof wins whenever Vf plays in 
a manner consistent with some sequence of coin tosses for the original verifier, 
and it wins by definition if the Vf plays inconsistently with any such sequence.) 
On the other hand, by the soundness condition, if the common input is a NO 
instance then there exists no winning strategy for P*. (This is because in this 
case no prover strategy can convince the original verifier with probability 1.) By 
the above, it follows that whenever the common input is a NO instance there 
exists a winning strategy for Vf . 

Thus, a proof that a: is a NO instance consists of a winning strategy for 
Vf. Such strategy is a function mapping partial transcripts of P* messages to 
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the next V* message. Thus, such a strategy is fully specified by a function from 
1}* to {0, l}P°b(”), and has description length poly(n) To verify 

that such a function constitutes a winning strategy for V* , one merely tries all 
possible deterministic strategies for the P* (i.e., all possible &(n)-bit long strings). 
The theorem follows. I 
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Abstract. We consider the quantum complexities of the following three 
problems: searching an ordered list, sorting an un-ordered list, and de- 
ciding whether the numbers in a list are all distinct. Letting N be 
the number of elements in the input list, we prove a lower bound of 
J(ln(A) — 1) accesses to the list elements for ordered searching, a lower 
bound of n{N log N) binary comparisons for sorting, and a lower bound 
of f2{'/N log N) binary comparisons for element distinctness. The previ- 
ously best known lower bounds are ^ log 2 (Y) — 0(1) due to Ambainis, 
n{N), and respectively. Our proofs are based on a weighted 

all-pairs inner product argument. 

In addition to our lower bound results, we give a quantum algorithm 
for ordered searching using roughly 0.6311og2(A) oracle accesses. Our 
algorithm uses a quantum routine for traversing through a binary search 
tree faster than classically, and it is of a nature very different from a 
faster algorithm due to Farhi, Goldstone, Gutmann, and Sipser. 



1 Introduction 

The speedups of quantum algorithms over classical algorithms have been a main 
reason for the current interests on quantum computing. One central question 
regarding the power of quantum computing is: How much speedup is possible? 
Although dramatic speedups seem possible, as in the case of Shor ’s m algo- 
rithms for factoring and for finding discrete logarithms, provable speedups are 
found only in restricted models such as the black box model. 

In the black box model, the input is given as a black box, so that the only 
way the algorithm can obtain information about the input is via queries, and 
the complexity measure is the number of queries. Many problems that allow 
provable quantum speedups can be formulated in this model, an example being 
the unordered search problem considered by Grover M- Several tight lower 
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bounds are now known for this model, most of them being based on techniques 
introduced in IHOQ. 

We study the quantum complexities of the following three problems. 

Ordered searching. Given a list of numbers x = {xq,Xi,... ,Xf^-i) in non- 
decreasing order and some number y, find the minimal i such that y < Xi. 
We assume that xn-i = oo > y so that the problem is always well-defined. 
Sorting. Given a list of numbers x = {xq, x\, . . . , Xn-i), output a permutation 
a on the set {0, . . . , iV — 1} so that the list {Xcr{o),Xcr(i), ■ ■ ■ , Xa-(N-i)) is in 
non-decreasing order. 

Element distinctness. Given a list of numbers x = {xq,Xi,... ,Xn-i), are 
they all distinct? 

These problems are closely related and are among the most fundamental and 
most studied problems in the theory of algorithms. They can also be formulated 
naturally in the black box model. For the ordered searching problem, we con- 
sider queries of the type “xi =?”, and for the sorting and element distinctness 
problems, we consider queries of the type “Is Xi < Xi'T", which are simply binary 
comparisons. Let Hi = i denote the ith harmonic number. We prove a 

lower bound for each of these three problems. 

Theorem 1. Any quantum algorithm for ordered searching that errs with prob- 
ability at most e > 0 requires at least 

(^1_ (1) 

queries to the oracle. In particular, any exact quantum algorithm requires more 
than i(ln(A^) — 1) Ri 0.220 log 2 ?V queries. 

Theorem 2. Any comparison-based quantum algorithm for sorting that errs 
with probability at most e > 0 requires at least 

2^6(1 (2) 

comparisons. In particular, any exact quantum algorithm requires more than 
^(In(A^) — 1) Ri 0.110iVlog2 iV comparisons. 

Theorem 3. Any comparison-based quantum algorithm for element distinctness 
that errs with probability at most e > 0 requires at least 

(^l-2v'e(l-e))^(i?^-l) (3) 

comparisons. 

The previously best known quantum lower bound for ordered searching is 
^ log 2 (?V) — 0(1), due to Ambainis Q. For comparison-based sorting and ele- 
ment distinctness, the previously best known quantum lower bounds are respec- 
tively 17(7V) and Q{'/N), both of which can be proven in many ways. 
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We prove our lower bounds by utilizing what we refer to as a weighted all- 
pairs inner product argument, or a probabilistic adversary argument. This proof 
technique is based on the work of Bennett, Bernstein, Brassard, and Vazirani 
and Ambainis |2|. 

Farhi, Goldstone, Gutmann, and Sipser m have given an exact quantum 
algorithm for ordered searching using roughly 0.526 log 2 (iV) queries. We pro- 
vide an alternative quantum algorithm that is exact and uses log 3 (fV) -|- 0(1) ~ 
0.631 log 2 (A^) queries. Our construction is radically different from the construc- 
tion proposed by Farhi et al. ca, and these are the only constructions known 
leading to quantum algorithms using at most clog 2 (A^) queries for some con- 
stant c strictly less than 1. 

Whereas most quantum algorithms are based on Fourier transforms and am- 
plitude amplification |7|, our algorithm is based on binary search trees. We ini- 
tiate several applications of the binary search algorithm in quantum parallel 
and let them find the element we are searching for in teamwork. By cooperat- 
ing, these applications can traverse the binary search tree faster than classically, 
hereby reducing the complexity from log 2 (fV) to roughly log 3 (iV). 

There are at least three reasons why the quantum complexities of the three 
problems are of interest. Firstly because of their significance in algorithmics in 
general. Secondly because these problems possess some symmetries and periodic- 
ities of a different nature than other studied problems in quantum algorithmics. 
Determining symmetries and periodicities seems to be a primary ability of quan- 
tum computers and it is not at all clear how far-reaching this skill is. Thirdly 
because searching and sorting represent non-Boolean non-symmetric functions. 
A (partial) function is said to be symmetric if it is invariant under permuta- 
tion of its input. Only few non-trivial quantum bounds for non-Boolean and 
non-symmetric functions are known. 

The rest of the paper is organized as follows. We first discuss the model in 
Sect. 13 present our general technique for proving lower bounds in Sect. 13. II 
and then apply it to the three problems in Sects. rT~2l43. 41 We give our quantum 
algorithm for ordered searching in Sect. El and conclude in Sect. 0 

2 Quantum Black Box Computing 

We give a formal definition of the black box model, which is slightly differ- 
ent from, but equivalent to, the definition of Beals, Buhrman, Gleve, Mosca, 
and de Wolf given in p]. Fix some positive integer N > 0. The input x = 
{xq, . . . , xn-i) G {0, 1}^ is given as an oracle, and the only way we can access 
the bits of the oracle is via queries. A query implements the operator 




Here i and z are non-negative integers. By a query to oracle x we mean an 
application of the unitary operator Ox ■ We sometimes refer to Ox as the oracle. 
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A quantum algorithm A that uses T queries to an oracle 0 is a unitary operator 
of the form 



We always apply algorithm A on the initial state |0). For every integer j > 0 
and every oracle x, let 



denote the state after j queries, given oracle x. After applying A, we always 
measure the final state in the computational basis. 

Consider the computation of some function / : 5” — >■ {0,1}"*, where S C 
{0,1}^. We say that algorithm A computes / with error probability bounded 
by e, for some constant e with 0 < e < 1/2, if for any x G S, the probability of 
observing f{x) when the m rightmost bits of IV'J) are measured is at least 1 — e. 

3 Lower Bounds 

3.1 General Technique 

We use the notation of Sect. El For any e > 0, let e' = 2-\/e(l — e). 

The computation always starts in the same initial state |0), so for all oracles 
X G S we have |'!/'°) = |0). If for two input oracles x,y G S, the correct answers 
are different, i.e., if f{x) yf /(y), then the corresponding final states lipx) 

\tpy) must be almost orthogonal. 

Lemma 4. For all oracles x,y G S so that f{x) ^ f{y), KV'JlV'y )l < ■ 

Now consider a probability distribution over those pairs of inputs (a;, y) G 
S X S for which f{x) fin)- For each integer j > 0, we use the following 
quantity to quantify the average progress of the algorithm in distinguishing any 
two inputs after applying (UO)^U, 



Observe that Wq = 1 and that Wt < e' by LemmaEl By proving that for every j 
with 0 < j < T, we have \Wj — < S, we conclude that T > (1 — e')/S. 

For simplicity of presentation, we scale the probabilities by using a weight 
function lo : S x S ^ K'*". From now on, we use the following definition of Wj 
to quantify the overall progress of the algorithm. 



A= (UO)^U. 



(5) 



fy}) = (UO,)^U|0) 



( 6 ) 



W, = E(x,y) [ ] . 




(7) 



x,yes 



Our technique is a natural generalization of Ambainis’ approach |2|, which 
uses uniform distributions over subsets of S' x 5. Our lower bound proofs imply 
that non-uniform distributions can give better lower bounds. Clearly, finding a 
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“good” distribution is an important step in applying our technique. Another 
important step is to find a tight bound on the progress after each oracle query. 

We end this subsection by introducing some notation and stating two lemmas 
we require when bounding the progress. For every z > 0, let Pj = X)z>o 1'^’ *1 

denote the projection operator onto the subspace querying the ith oracle bit. For 
* < 0, operator is taken as the zero projection. The following lemma, which 
may be proven by the Cauchy-Schwarz inequality, bounds the quantified progress 
that one oracle query makes in distinguishing two inputs x and y. 

Lemma 5. For any oracles x,y G {0, 1}'^, and any integer j > 0, 



We sometimes write IV'e) as shorthand for \-tpl) once integer j is fixed. 

Let A = [oik^i]i<k,i<ca be the Hilbert matrix with = l/(fc + ~ 1), and 
III j ||2 be the spectral norm, i.e., for any complex- valued matrix M G the 

norm |||M |||2 is defined as max{||Ma;|| 2 }, where the maximum is taken over all 
unit vectors x G C"*. Let Ha? = [l3k,i]i<k,t.<N be the matrix where entry j3k,i. is 
kj^i_i ii k + £ < N + 1, and 0 otherwise. Clearly |||i?Ar |||2 < |||A |||2 for any iV > 0. 
Our lower bound proofs rely on the following property of the Hilbert matrix. 

Lemma 6 (E.g.: Choi JlU|). |||A |||2 = tt. Hence, |||Sjv |||2 < tt. 

3.2 Lower Bound for Ordered Searching 

The first non-trivial quantum lower bound on ordered searching proven was 
C(i/log 2 (iV)/log 2 log 2 (fV)), due to Buhrman and de Wolf [3| by an ingenious 
reduction from the parity problem. Farhi, Goldstone, Gutmann, and Sipser cn 
improved this to log 2 (iV )/2 log 2 log 2 (iV), and Ambainis P then proved the pre- 
viously best known lower bound of ^ log 2 (A^) — 0(1). In dd, they use, as we 
do here, an inner product argument along the lines of j5]. In this section, we 
improve the lower bound by a constant factor. 

For the purpose of proving the lower bound, we assume that each of the 
N input numbers is either 0 or 1, and that the input does not consist of all 
zeroes. That is, the set S of possible inputs are the ordered TV-bit strings of 
non-zero Hamming weight. The search function / : S' — )> {0, 1}™ is defined by 
f{x) = min{0 < z < | = 1}, where we identify the result f{x) with 

its binary encoding as a bit-string of length m = [log 2 (A^)]. As our weight 
function w, we choose the inverse of the difference in Hamming weights. 



With this choice, we have that Wq = NHjsi — N and by Lemma 0] also that 
Wt < e^Wo. Theorem P then follows from the next lemma. 

Lemma 7. For every j with 0 < j < T we have that \ Wj — IFj+i| < ttN. 



imi’i) - < 2 wp^mw ■ \\pWy)\i ( 8 ) 







(9) 



Quantum Complexities 351 



Proof. As shorthand, we write \tpf[x)) for \ipl). By LemmaEl 

N-2 N-1 e-1 

E ^E 

k—0 £—k-\-l i—k 

N-1 d-1 N-d-1 

= T. \\PkNi’k)\\-\\PkNi^k+d)l 

d—1 i—0 k—0 

Let vectors 7 = [7i]o<i<Af-i G and <5 = [<5i]o<i<Ar-i S be defined by 

( N-1 \ 1/2 /N-1 

||Pfe+i|V'fe)f j and = ( X! \\Pk-^-lW’k)\\^ 

fc=o / V fc=o 

Then, by the Cauchy-Schwarz inequality, 

N-1 d-1 

jW, - Wj+i\ < 2 YY = 2 -f*BNS, (10) 

d=l i=0 ® 

where t denotes matrix transposition. Since each vector |'0fc) is of unit norm, 
we have H7H2 + H^Hi E N, so ||7||2||<5||2 E N/ 2 . The matrix product 2 'j*B]\fS is 
upper bounded by 2H7H2 • ll^^Arlb • PII2, which is at most nN by Lemma|Bl □ 

3.3 Lower Bound for Sorting 

We assume that the N numbers to be sorted, x = (xq, . . . ,xn-i), correspond 
to some permutation a on {0, 1 , . . . , iV — 1}. That is, Xi = a{i) for every 0 < i < 
N. We assume the input to the quantum algorithm is the comparison matrix 
Ma = [TOij']o<*.i'<AT with 

1 if a(i) < (r{i') 

0 otherwise. 

One comparison corresponds to one application of the oracle operator 

2>0 2,2^>0 

To simplify notation, we sometimes identify the input M„ with the underlining 
permutation a. 

For every pair {i, i'} of indices with 0 < i,i' < N, let 

Pti' = Y \^'dd'){z;i,i'\ + Y 

z>0 ^>0 

denote the projection operator onto the subspace comparing the ith and (i')th 
elements. For any vector l-ip), we use \'ip\a,k,e.) as shorthand for I'*/’)- 
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For every permutation cr, and every integers 0 < k < N — 2 and 1 < d < 
N — 1 — k, define a new permutation, 

^{k,d) = _ ,k + d) o a. (11) 



If r = then 



cr ^(i) 



V-i(fc) 

< T~^{i + 1 ) 



if i = fc + d 
if k<i<k + d 
otherwise. 



(12) 



This implies that the comparison matrices M„ and Mr differ only on the follow- 
ing pairs of entries, 

{cr“^(fc -I- d),cr“^(fc -I- i)} = {r“^(/c),r“^(/c -I- i -I- 1)} (13) 



for all i with 0 < * < d. 

Informally, if Mg- corresponds to some list x, then Mr corresponds to the 
list y obtained by replacing the element of rank fc -|- d in a; by a new element of 
rank k (the element in x that had rank k then has rank k + 1 in y, etc.). The 
only way the algorithm can distinguish a from r is by comparing the element of 
rank A: -|- d in a; with one of the d elements of rank k + i for some 0 < i < d. 

We choose the following weight function. 



uj(a,T) 



i if T = for some k and d 

a 

0 otherwise. 



(14) 



Then one may verify that Wq = N\{NH]\r — N), and Wt < e'Wo. To prove 
Theorem El we need only to prove the following lemma. 

Lemma 8. For any j with 0 < j < T, \Wj — Wj+il < 27 tA^!. 

Proof. Similar to the proof of Lemma 0 By Lemma El and dED, 

N-ld-1 N-d-1 

cr,fc+rf,/c+i) II ‘ II I ttTjfc+rfjfc+i) II • 

d—1 i—0 <7 k—0 



Let 7 = [7*]i<i<jv e ^ be such that 7^ = (X]^ \\\'^a\a,e,i+i)\fY^^ , 

where we let £ range from 0 to iV — 1 and simply set the thus caused undefined 
projection operators to be zero operators. Then by onj, 

N-d-1 N-d-1 

y 'y ] II |'0cr('='‘^) (o-.fc-l-d./c-l-i) II = ^ ^ ^ ^ 1 1 |'0 t (r.fe.fc+z-l-l ) 1 1 ^ 
cr k=0 T k=0 

Applying the Cauchy-Schwarz inequality, and in analogy with m, 

N-l d-1 

m - W(,-+i| < 2 ^ E = 27‘i3jv-i7- (15) 

d=l i=0 ^ 

Since H7H2 < -/V!, we conclude that \Wj — Wj^i \ < 2 ttN\. □ 
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3.4 Lower Bound for Element Distinctness 

We modify the adversary for sorting as follows. As in Sect. 13.31 when we talk 
about permutations, the underlying set is {0, 1, . . . ,N— 1}. 

Definition 9. An annotated permutation is a permutation r with a marker on 
a single element r^ for some 0 < r,- < iV — 1. 

For every permutation a, and every integers k and d as in Sect. 13.31 the annotated 
permutation t = is the same permutation as in (HU but with the rank 

k element marked. The only places where and differ, are at the same 
entries as those in liiH) • 

We use the same weight function as in (II 411 . Then Wq = Nl {NHjy — N) and 
Wt < e'Wo. We need only to prove the following lemma. 

Lemma 10. For any integer j with 0 < j < T , \Wj — lFj+i| < 2TrN\\fN . 

Proof. Almost identical to the proof for Lemma 0 except that we now require a 
second vector S = S with Si = {J2r || IV'rtr.r,,r,+i) Then 

by (dU, 

N-d-l 

'y ] y . II tcr,fc+d,fc+i) II = ^ ^ II I'^rfT.r.r.'rT+i+l) II — '^z+1- 

<y k—0 r:rT<N—d 

In analogy with m, we have 



N-l d-l 

\Wj - Wj+i\ 37d-.<5*+i = 27‘BAr-i<5. 

d^l i^O ^ 

Besides having || 7 ||^ < N\ as in the proof of Lemma 0 we also have that 

N-l 

= E E \\\^r\r,r^,r^+^)f < iV!(iV- 1) < N\N. 

i—1 T 

Therefore, \Wj - Wj+i\ < 27r\/M/AT/V = 2ttN\^/N . □ 

4 A log 3 (Af) Algorithm for Ordered Searching 

We begin by considering binary search trees on which our quantum algorithm is 
based. Let T be a binary tree with N >2 leaves. We put colored pebbles on the 
(internal) vertices of T subject to the following 2 conditions: 

(A) on every path from the root of T to a leaf, there is exactly 1 pebble of each 
color, and 

(B) the number of pebbles on any vertex ti G T is at least as large as the total 
number of pebbles on its proper ancestors. 
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We say that T is covered by N' pebbles if we can satisfy the 2 above conditions 
using at most N' pebbles of each color. We want to minimize the maximum 
number N' of pebbles used of any color. We say a covering is fair if it uses 
the same number of pebbles of every color. We say a covering is tight if, for all 
vertices v € T, we have that equals the total number of pebbles on its proper 
ancestors, or there are no pebbles on any of the ancestors of v. We require the 
following two lemmas. 

Lemma 11. For every even integer N > 2, there exists a binary tree with N 
leaves that can be fairly and tightly covered by N' = [|iV + log 2 (iV)J pebbles 
using 2^ colors, where s = [log 4 (iV/ 2 )J . 

Lemma 12. Let integer-valued function F be recursively defined by 

l^(U^ + log2(^) + lJ) + l ^fN>8 

\l ifN<8. 

Then F{N) = log^iN) + 0(1). 

As in Sect. rni we assume the oracle x = (a;o: ■ • ■ ,xn-i) £ {0,1}'^ is a 
binary string of non-zero Hamming weight. The problem is to determine the 
leftmost 1 in x, that is, to compute f{x) = min{0 < i < N \ Xi = 1}. Let T 
be a binary tree with N leaves for which Lemma ^Dholds. Let s = [log4(fV/2)J 
and N' = -|- log 2 (iV)J be as in the lemma. We label the N leaves of T by 

{0, . . . , — 1} from left to right. Let denote the leaf labelled by f{x), and 

let V denote the path from the root of T to the parent of if{x)- We think of V 
as the path the classical search algorithm would traverse if searching for f{x) in 
tree T. 

Let C = {cq, . . . , C 2 «_i} be the set of 2® colors used in Lemma [HI For each 
color c G C, let W denote the set of vertices in T populated by a pebble of color c. 
By Condition (A), there are at most N' such vertices, that is, |W| < N'- Let Vc 
denote the unique vertex in W that is on path V. We think of vertex Vc as the 
root of the subtree “containing” leaf if(x)- Note that, by definition, Vc G V for 
every color c G C, and that J2vevPv = 2® by Condition (A). 

Our algorithm utilizes 3 unitary operators, Ui, 0^, and U 2 . The first operator, 
Ui, is defined by 

Ui: |^)|0) ^ |u)(^^^|c)) (uGT), (16) 

where the summation is over all colors c G C that are represented by a pebble on 
vertex v. We refer to Ui as the coloring operator and its inverse as the un-coloring 
operator. 

The query operator 0^ is defined by 



OL : 



|u) 



k; Xj) 



if there are no pebbles on the parent of v 
otherwise. 



(17) 



Quantum Complexities 355 



where i denotes the label of the rightmost leaf in the left subtree of vertex v. 
Query operator 0^ is clearly unitary (or rather, can be extended to a unitary 
operator since it is only defined on a proper subspace) . Operator 0^ is slightly 
different from, but equivalent to, the query operator defined in Sect.[2- It mimics 
the classical search algorithm by querying the bit Xi that corresponds to the 
rightmost leaf in the left subtree of v. 

We also use a unitary operator U 2 that maps each vertex to a superposition 
over the leaves in its subtree. For every vertex and leaf u in T, let C{u) denote 
the set of leaves in the subtree rooted at u, and let 

l« = E ( 18 ) 

where d(rt, €} denotes the absolute value of the difference in depths of u and 

leaf £. The unitary operator U 2 is (partially) defined as follows. For every vertex 
V G T with no pebbles on its parent, 

|u;0) ^ l^right(Q) (19.1) 

|u;l) ^ (19.2) 



and for every vertex v G T with pebbles on its parent, 



I'll) ' ^ (|^right(j;)) l^left(ii))) ■ (19.3) 

Here left(u) denotes the left child of v, and right(u) the right child. 

Our quantum algorithm starts in the initial state |0) and produces the final 
state \£f(x))- Let F{N) denote the number of queries used by the algorithm on 
an oracle x of size N. 



1. We first set up a superposition over all 2^ colors, ScgC |9)|c). 

2. We then apply our exact quantum search algorithm recursively. For each 
color c G C in quantum parallel, we search recursively among the vertices 
in Vc, hereby determining the root Vc G Vc of the subtree containing the 
leaf £/( 2 ,). Since |14| < N' , this requires at most F(N' + 1) queries to oracle x 
and produces the superposition Scgc I^c) I*-) - Since every vertex Vc in this 
sum is on the path V, we can rewrite the sum as 



1 



vev 



E 

c^C:vc—v 



|c). 



3. We then apply the un-coloring operator producing the superposition 
k)|9)- Ignoring the second register which always holds a zero, 

this is 

^ vev 



That is, we have (recursively) obtained a superposition over the vertices on 
the path V from the root of T to the parent of the leaf £y( 2 ,) labelled by f{x). 
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4. We then apply the operator U20^, producing the final state 






v^V 



\/ 2 ® 



v^V 



which one can show equal to \£f(^^'i). Thus, a final measurement of this state 
yields f{x) with certainty. 

The total number of queries to the oracle x is at most i^(7V'+ 1) + 1, and thus, 
by Lemma El the algorithm uses at most log 3 (A^) + 0(1) queries. Theorem El 
follows. 

Theorem 13. The above described quantum algorithm for searching an ordered 
list of N elements is exact and uses at most log 3 (iV) + 0(1) queries. 



5 Concluding Remarks and Open Problems 

The inner product of two quantum states is a measure for their distinguishability. 
We have proposed a weighted all-pairs inner product argument as a tool for 
proving lower bounds in the quantum black box model. The possibility of using 
non-uniform weights seems particularly suitable when proving lower bounds for 
non-symmetric (possibly partial) functions. It could be interesting to consider 
other measures than inner products, as discussed, for instance, by Zalka PSI, 
Jozsa and Schlienz uni, and Vedral H3- 

The result of Grigoriev, Karpinski, Meyer auf der Heide, and Smolensky m 
implies that if only comparisons are allowed, the randomized decision tree com- 
plexity of element distinctness has the same i7(7Vlog N) lower bound as sorting. 
Interestingly, their quantum complexities differ dramatically: the quantum algo- 
rithm by Buhrman et al. jEj uses only log N) comparisons. There is still 

a big gap between this upper bound and our lower bound of l7(iV^/^ log N). One 
way of closing this gab might be to consider quantum time-space tradeoffs, as 
has been done for the classical case |EE|. 

Our algorithm for searching an ordered list with complexity log 3 (fV) + 0(l) is 
based on the classical binary search algorithm. The quantum algorithm initiates 
several independent walks/searches at the root of the binary search tree. These 
searches traverse down the tree faster than classically by cooperating, and they 
eventually all reach the leaf we are searching for in roughly log 3 (A^) steps. It could 
be interesting to consider if similar ideas can be used to speed up other classical 
algorithms. For instance one may consider other applications of operators like U 2 
acting on rooted trees and graphs. 
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Abstract. We introduce a new model for studying quantum data struc- 
ture problems — the quantum cell probe model. We prove a lower bound 
for the static predecessor problem in the address-only version of this 
model where, essentially, we allow quantum parallelism only over the ‘ad- 
dress lines’ of the queries. This model subsumes the classical cell probe 
model, and many quantum query algorithms like Grover’s algorithm fall 
into this framework. We prove our lower bound by obtaining a round 
elimination lemma for quantum communication complexity. A similar 
lemma was proved by Miltersen, Nisan, Safra and Wigderson jO] for clas- 
sical communication complexity, but their proof does not generalise to 
the quantum setting. 

We also study the static membership problem in the quantum cell probe 
model. Generalising a result of Yao we show that if the storage 
scheme is implicit, that is it can only store members of the subset and 
‘pointers’, then any quantum query scheme must make 4?(logn) probes. 
We also consider the one-round quantum communication complexity of 
set membership and show tight bounds. 



1 Introduction 

A static data structure problem consists of a set of data D, a set of queries Q, 
a set of answers A, and a function f : D x Q ^ A. The aim is to store the 
data efficiently and succinctly, so that any query can be answered with only 
a few probes to the data structure. In a seminal paper PI, Yao introduced 
the (classical) cell probe model for studying static data structure problems in 
the classical setting. Thereafter, this model has been used extensively to prove 
upper and lower bounds for several data structure problems (see 0, ra, 0, 0). 
A classical (s, w, t) cell probe scheme for / has two components: a storage scheme 
and a query scheme. Given the data to be stored, the storage scheme stores it 
as a table of s cells, each cell w bits long. The query scheme has to answer 

* * * Part of this work was done while visiting UC Berkeley and DIMACS, under a Sarojini 
Damodaran International Fellowship grant, 
t Supported by NSF grant CCR-9987845 and a joint lAS-DIMACS postdoctoral fel- 
lowship. 
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queries about the data stored. Given a query, the query scheme computes the 
answer to that query by making at most t probes to the stored table, where each 
probe reads one cell at a time. The storage scheme is deterministic whereas the 
query scheme can be deterministic or randomised. The goal is to study tradeoffs 
between s, t and w. For an overview of results in this model, see the survey by 
Miltersen 0. 

In this paper, we study static data structure problems, such as the static 
membership problem and the static predecessor problem, when the algorithm is 
allowed to query the table using a quantum superposition. We formalise this by 
defining the quantum cell probe model similar to the quantum bit probe model 
of IJ2I . We show a lower bound for the predecessor problem in a restricted version 
of this model, which we call the address-only quantum cell probe model. In the 
predecessor problem, the storage scheme has to store a subset S of size at most 
n from the universe [to], such that given any query element x G [to], one can 
quickly find the predecessor of a: in 5. 

Result 1 (Lower bound for predecessor): Suppose that we have 
an address-only quantum cell probe solution to the static predecessor 
problem, where the universe size is to and the subset size is at most 
n, using cells each containing O(logTO) bits. Then the number 

of queries is at least f7(\/loglogTO) as a function of to, and at least 
l7(log^^^ n) as a function of n. 

We then consider the static membership problem. Here one has to answer 
membership queries instead of predecessor queries. Yao m showed that if the 
universe is large enough, any classical deterministic implicit scheme for the static 
membership problem must make 17 (log n) probes to the table in the worst case. 
An implicit scheme either stores a ‘pointer value’ (viz. a value which is not an 
element of the universe) or an element of 5" in a cell. In particular, it is not 
allowed to store an element of the universe which is not a member of S. We 
generalise Yao’s result to the quantum setting. 

Result 2 (informal statement): If the storage scheme is implicit then, 
if the universe is large enough compared to the number of cells of storage, 
the quantum query algorithm must make l7(logn) probes. 

Remarks: 

1. Our address-only quantum cell probe model subsumes the classical cell probe 
model. Hence, our lower bound for the static predecessor problem is a generalisa- 
tion of a similar result shown for the classical cell probe model with randomised 
query schemes, by Miltersen et al |2|. This lower bound is the best known for 
classical randomised query schemes, if the storage scheme uses cells each 
containing O(logm) bits. Thus, our quantum lower bounds are as strong as the 
best known classical randomised lower bounds. The best upper bound known 
uses 0(n) cells of storage, each cell contains O(logTO) bits, and answers prede- 
cessor queries with 0( min (log log to / log log log to, y^log n / log log n) ) probes. In 
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fact, it is a classical deterministic query scheme. For deterministic schemes, the 
above bound is tight. Both the above bound, and its optimality for deterministic 
schemes, have been proved by Beame and Fich pp. 

2. It is known that querying in superposition gives a speed up over classical algo- 
rithms for certain data retrieval problems, the most notable one being Grover’s 
algorithm for searching an unordered list of n elements using 0(^/n) quantum 
queries. The power of quantum querying for data structure problems was stud- 
ied in the context of static membership by Radhakrishnan et al. In their 
paper, they worked in the quantum bit probe model, which is our quantum cell 
probe model where the cell size is just one bit. They showed, roughly speaking, 
that quantum querying does not give much advantage over classical schemes for 
the set membership problem. Our result above seems to suggest that quantum 
search is perhaps not more powerful than classical search for the predecessor 
problem as well. 

3. In the next section, we formally describe the “address-only” restrictions we 
impose on the query algorithm. Informally, they amount to this: we allow quan- 
tum parallelism over the ‘address lines’ going into the table, hut we have a fixed 
quantum state on the ‘data lines This restriction on quantum querying does not 
make the problem trivial. In fact, many non-trivial quantum search algorithms, 
such as Grover’s algorithm jS| and Hoyer and Neerbek’s algorithm 0, already 
satisfy these restrictions. 

4. For the static membership problem, Fredman, Komlos and Szemeredi 0 have 
shown a classical deterministic cell probe solution where the storage scheme uses 
0(n) cells each containing O(logm) bits, and the query scheme makes only a 
constant number of probes. In this solution, the storage scheme may store ele- 
ments of the universe in the table which are not members of the subset to be 
stored. Hence the restriction that the storage scheme be implicit is necessary 
for any such result. We note that implicit storage schemes include many of the 
standard storage schemes like sorted array, hash table, search trees etc. 

1.1 Techniques 

The lower bounds for the static membership problem shown in the quantum 
bit probe model relied on linear algebraic techniques. Unfortunately, these 
techniques appear to be powerless for the quantum cell probe model. In fact, 
to show the lower bound above for the static predecessor problem, we use a 
connection between quantum data structure problems and two-party quantum 
communication complexity, similar to what was used by Miltersen, Nisan, Safra 
and Wigderson |2j for showing the classical lower bound. They proved a techni- 
cal lemma in classical communication complexity called the round elimination 
lemma and derived from it lower bounds for various static data structure prob- 
lems. In this paper we prove an analogue of their round elimination lemma for 
the quantum communication complexity model, which we then use to show the 
quantum lower bound for the static predecessor problem. The quantum round 
elimination lemma also has applications to other quantum communication com- 
plexity problems, which might be interesting on their own. 
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Suppose f : X X Y — >Zisa function. In the communication game corre- 
sponding to /, Alice gets a string x £ X, Bob gets a string y £Y and they have 
to compute f{x, y). In the communication game corresponding to Alice gets 
n strings xi, . . . ,x„ G X; Bob gets an integer i £ [n], a string y £Y, and a copy 
of the strings x\, . . . , Xi-\. Their aim is to compute f{xi, y). Suppose a protocol 
for is given where Alice starts, and her first message is a bits long, where a 
is much smaller than n. Intuitively, it would seem that since Alice does not know 
i, the first round of communication cannot give much information about Xi, and 
thus, would not be very useful to Bob. The round elimination lemma justifies 
this intuition. Moreover, we show that this is true even if Bob also gets copies 
of a;i, . . . , Xi_i, a case which is needed in many data structure applications. 

Result 3 (Quantum round elimination lemma, informal state- 
ment): A t round quantum protocol for with Alice starting, gives 
us a t — 1 round protocol for f with Bob starting, with similar message 
complexity and error probability. 

Round reduction arguments have been given earlier in quantum communica- 
tion complexity, most notably by Nayak, Ta-Shma and Zuckerman m- However, 
for technical reasons, the previous arguments do not go far enough to prove lower 
bounds for the communication games arising from data structure problems like 
the predecessor problem. We need a technical quantum version of the round 
elimination lemma of Miltersen et al 0, to prove the desired lower bounds. 

We also study the set membership communication game MEM^.n, where 
Alice is given an element x of a universe of size m, and Bob is given a subset S 
of the universe of size at most n. They have to communicate and decide whether 
X £ S. We consider bounded error one round quantum communication protocols 
for this problem in both the cases of Alice and Bob speaking. We give tight 
upper and lower bounds for this problem in both these cases. 

Result 4: The bounded error one round quantum communication com- 
plexity of the set membership problem MEMm,n> when Alice starts, is 
6>(logn -I- loglogm), and when Bob starts, is 0(n -|- loglogm). 

1.2 Organisation of the Paper 

Section El contains definitions of various terms that will be used throughout 
the paper. In Section El we discuss some lemmas that will be needed in the 
proofs of the main theorems. Section^ contains the proof of the quantum round 
elimination lemma. In Section 0 we prove our quantum lower bounds for the 
static predecessor problem. The proofs of our results on implicit storage schemes 
for the static membership problem, and the one round quantum communication 
complexity of set membership, as well as proofs of various lemmas which have 
been omitted due to lack of space, can be found in the full version d. 

2 Definitions 

In this section we define some of the terms which we will be using in this paper. 
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2.1 The Quantum Cell Probe Model 

A quantum (s, w, t) cell probe scheme for a static data structure problem / : 
D X Q ^ A has two components: a classical deterministic storage scheme that 
stores the data d G D in a table Td using s cells each containing w bits, and 
a quantum query scheme that answers queries by ‘quantumly probing a cell at 
a time’ at most t times. Formally speaking, the table Tj, for the stored data is 
made available to the query algorithm in the form of an oracle unitary transform 
Od- To define Od formally, we represent the basis states of the query algorithm 
as \ j, b, z), where j S {0, . . . , s — 1} is a binary string of length log s, 5 is a binary 
string of length w, and z is a binary string of some fixed length. Here, j denotes 
the address of a cell in the table Td, b denotes the qubits which will hold the 
contents of a cell and z stands for the rest of the qubits in the query algorithm. 
Od maps \j,b,z) to \j,b(B {Td)j,z), where {Td)j is a bit string of length w and 
denotes the contents of the jth cell in T^. A quantum query scheme with t probes 
is just a sequence of unitary transformations 



Uq — >■ Od — >■ Ui — >■ Od —>■... Ut-i — >■ Od — >■ Ut 



where Uj’s are arbitrary unitary transformations that do not depend on the data 
stored. For a query q G Q, the computation starts in an observational basis state 
|<7)|0), where we assume that the ancilla qubits are initially in the basis state |0). 
Then we apply in succession, the operators Uq, Od, ■ ■ ■ , Od, Ut, and measure the 
final state. The answer consists of the values on some of the output wires of the 
circuit. We require that the answer be correct with probability at least 2/3. 

We now formally define the address-only quantum cell probe model. Here the 
storage scheme is as in the general model, but the query scheme is restricted to 
be ‘address-only’. This means that the state vector before a query to the oracle 
is always a tensor product of a state vector on the address and work qubits (the 
\j,z) part in \j,b,z) above), which can depend on the query element and the 
probe number, and a state vector on the data qubits (the |6) part in \j, b, z) 
above), which is independent of the query element but can vary with the probe 
number. Intuitively, we are only making use of quantum parallelism over the 
address lines. This mode of querying a table subsumes classical querying, and 
also many non-trivial quantum algorithms like Grover’s algorithm 0, Hoyer and 
Neerbek’s algorithm pj etc. satisfy this condition. For Grover, and Hoyer and 
Neerbek, the state vector on the data qubit is (|0) — |l))/-\/2, independent of the 
probe number. 



2.2 Quantum Communication Protocols 

We consider two party quantum communication protocols as defined by Yao [EZ|. 
Suppose f : X xY — >Zisa function. In the communication game corresponding 
to /, Alice gets a string x G X, Bob gets a string y GY and they have to compute 
f{x,y). We say a quantum protocol computes / with e-error, if for any input 
{x,y) G X xY, the probability that the protocol outputs the correct result 
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f{x, y) is at least 1 — e. The term ‘bounded error quantum protocol’ means that 
e= 1/3. 

We require that Alice and Bob make a secure copy of their inputs before 
beginning the protocol. This is possible since the inputs to Alice and Bob are 
in computational basis states. Thus the qubits of Alice and Bob holding their 
inputs are never sent as messages, remain unchanged throughout the protocol 
and are never measured i.e. some work qubits are measured to determine the 
result of the protocol. We call such protocols secure. We will assume henceforth 
that all our protocols are secure. 

We now define a class of quantum protocols called safe protocols, which will 
be used in the statement of the round elimination lemma. 

Definition 1 (Safe quantum protocol). A [t,c,a,b]^ safe quantum protocol 
P is a secure protocol where the per round message lengths of Alice and Bob 
are a and b qubits respectively, Alice starts first and the communication goes on 
fort rounds. The notation [t,c,a,b]^ means the same as above, except that Bob 
starts first. We allow the first message to have an overhead of c qubits i.e. if 
Alice starts, the first message is a + c qubits long and if Bob starts, the first 
message is b + c qubits long. The density matrix of the overhead is independent 
of the inputs to Alice and Bob. Ifc=0, we abbreviate the notation to a [t,a,b\^ 
protocol. 

Remark: The concept of a safe quantum protocol may look strange at first. The 
reason we need to define it, intuitively speaking, is as follows. The communication 
games arising from data structure problems often have an asymmetry between 
the message lengths of Alice and Bob. This asymmetry is crucial to prove lower 
bounds on the number of rounds of communication. In the previous quantum 
round reduction arguments, the complexity of the first message in the protocol 
increases quickly as the number of rounds is reduced and the asymmetry gets 
lost. This leads to a problem where the first message soon gets big enough to 
potentially convey substantial information about the input of one player to the 
other, destroying any hope of proving strong lower bounds on the number of 
rounds. The concept of a safe protocol allows us to get around this problem. 
We show through a careful quantum information theoretic analysis of the round 
reduction process, that in a safe protocol, though the complexity of the first 
message increases a lot, this increase is confined to the safe overhead and so, 
the information content does not increase much. This gives us an asymmetry in 
the information flow. This is sufficient to let the round elimination arguments 
go through in various applications. 

In this paper we will deal with quantum protocols with public coins. Intu- 
itively, a public coin quantum protocol is a probability distribution over (coin- 
less) quantum protocols. We shall henceforth call the standard definition of a 
quantum protocol as coinless. Our definition is similar to the classical scenario, 
where a randomised protocol with public coins is a probability distribution over 
deterministic protocols. We note however, that our definition of a public coin 
quantum protocol is not the same as that of a quantum protocol with prior en- 
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tanglement, which has been studied previously (see e.g. j^)- Our definition is 
weaker, in that it does not allow the unitary transformations of Alice and Bob 
to alter the ‘public coin’. 

Definition 2 (Public coin quantum protocol). In a quantum protocol with 
a public coin, there is a shared quantum state called a public coin, of the form 
where Pc are non-negative real numbers and 'Y^Pc = 1- Alice and Bob 
make a secure eopy of the coin before eommencing the protoeol. Thus, if the coin 
is in a basis state |c), the unitary transformations of Alice and Bob do not alter 
it. The coin is never measured. 

Hence, one can think of the public coin quantum protocol to be a probability 
distribution, with probability pc, over coinless quantum protocols indexed by the 
coin basis states c. A safe public coin quantum protocol is thus, a probability 
distribution over safe coinless quantum protocols. 

Remark: We need to define public coin quantum protocols, so as to make use of 
the harder direction of Yao’s minimax lemma US! The minimax lemma is the 
main tool which allows us to convert average case round reduction arguments 
to worst case arguments. We need worst case type round reduction arguments 
in proving lower bounds for the rounds complexity of communication games 
arising from data structure applications. This is because many of these lower 
bound proofs use some notion of “self-reducibility” , arising from the original 
data structure problem, which fails to hold in the average case. 

For an input x,y, we define the error y of the (coinless or public coin) 
protocol P to be the probability that the result of P on x,y is not equal to 
f{x,y). For a coinless quantum protocol P, given a probability distribution p 
on the inputs x,y of a, specified size, we define the average error ejf of P with 
respect to p as the expectation over p of the error of P on inputs x, y. We define 

to be worst case error of P on inputs x,y. 

In the proof of the round elimination lemma, we need to do parallel repe- 
titions of public coin protocols. We also construct new protocols from old ones 
using both the directions of Yao’s minimax lemma. We note that all these oper- 
ations preserve the “safety” of the protocol. 

3 Preliminaries 

In this section we state some facts which will be useful in what follows. 

3.1 Quantum Cell Probe Complexity and Communication 

In this subsection, we describe the connection between the quantum cell probe 
complexity of a static data structure problem and the quantum communication 
complexity of an associated communication game. Let f : D x Q ^ A he a 
static data structure problem. Consider a two-party communication problem 
where Alice is given a query q G Q, Bob is given data d G D, and they have to 
communicate and find out the answer f{d,q). We have the following lemma. 
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Lemma 1. Suppose we have a quantum {s,w,t)-cell probe solution to the static 
data structure problem f. Then we have a [2i, log s + w, log s + safe coinless 
quantum protocol for the corresponding communication problem. If the query 
scheme is address-only, we can get a [2t, log s, log s + safe coinless quantum 
protocol. 

Proof. The protocol just simulates the cell probe solution. Note that if the query 
scheme is address-only, the messages from Alice to Bob need consist only of the 
‘address’ part. The details are omitted. □ 

3.2 Background from Quantum Information Theory 

In this subsection, we discuss some basic facts from quantum information theory 
that will be used in the proof of the round elimination lemma. We follow the 
notation of Nayak, Ta-Shma and Zuckerman’s paper cni. For a good account of 
quantum information theory, see the book by Nielsen and Chuang ca. 

If A is a quantum system with density matrix p, then S'(A) = S{p) = 
— Tr plog p is the von Neumann entropy of A. If A, B are two disjoint quantum 
systems, their mutual inf ormation is defined as I{A : B) = S {A) S {B) — S (AB) . 

Suppose A is a classical random variable. Let A be in a mixed state {px, |a;)}, 
jx) orthonormal. Let Q be a quantum encoding of A i.e. it is an encoding |a:) i— ^ 
ax, where ax is a density matrix. Thus, the joint density matrix of (A, Q) is 

YhxPx\^)^'^\ ® Define a = to be the density matrix of the average 

encoding. Then, S{XQ) = S'(A) -|- hence, /(A : Q) = S{a) — 

'Yhx PxS{ax). 

If A can be written as A = A1A2, where Ai, A2 are classical random vari- 
ables, and Q is a quantum encoding of A, we can define d((Ai : Q)|A2 = X2) to 
be the mutual information between Ai and Q when A2 is fixed to X2. 

We now state the following propositions, whose proofs are to be found in the 
full version m 

Proposition 1. Suppose M = M1M2 is a quantum encoding of a classical ran- 
dom variable X, where the density matrix of M2 is independent of X. Let Mi 
be supported on a qubits. Then, /(A : M) < 2 a. 



Proposition 2. Suppose M is a quantum encoding of a classical random vari- 
able X. Suppose X = A1A2 . . . A„, where the Xi are classical independent ran- 
dom variables. Then, I{Xi . . . A„ : M) — ■ AIXi . . . Xi_i). 

Proposition 3. Let A, Y be classical random variables and M be a quantum 
encoding of{X,Y). Then I{Y : MX) = /(A : Y) + Ex[I{{Y : M)|A = x)\. 

We use the trace norm on linear operators to measure the “distance” between 
two density matrices. For a linear operator A, the trace norm of A is defined 
as ||A||t = Tr \/ AA A. The trace distance between two density matrices p\,p2. 
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IIpi ~ P 2 ||t, bounds the l\ distance between the probability distributions on the 
outcomes obtained by a measurement on p\ and p 2 - 

In the proof of the round elimination lemma, we will use the “average encod- 
ing theorem” in the strong form by Klauck [7|. We state it below in the version 
required for our purposes, for completeness. A short proof sketch of the theorem 
can be found in the full version m 

Theorem 1 (Average encoding theorem). Let X,Q be two disjoint quan- 
tum systems where X is a elassieal random variable, which takes value x with 
probability px , and Q is a quantum encoding x ^ (Jx of X. Let the density matrix 
of the average encoding be a = YhxP^'^x- Then 

'^Px\Wx - cr\\t < y/(21n2)/(A : Q) 



4 The Quantum Round Elimination Lemma 

In this section we prove our round elimination lemma for safe public coin quan- 
tum protocols. Since a public coin quantum protocol can be converted to a 
coinless protocol at the expense of an additional “safe” overhead in the first 
message, we also get a similar round elimination lemma for coinless protocols. 
We can decrease the overhead to logarithmic in the total bit size of the inputs 
by a technique similar to the public to private coins conversion for classical ran- 
domised protocols HH. But since the statement of the round elimination lemma 
is cleanest for safe public coin quantum protocols, we give it below in this form 
only. 

Lemma 2 (Round elimination lemma). Suppose f : X x Y Z is a 

function. Suppose the communication game has a [t,c,a,b]"^ safe pub- 

lic coin quantum protocol with worst case error at most S. Then there is a 
[t — l,c-\- a,a,b]^ safe public coin quantum protocol for f with worst case error 

at most e = 26 -h 2(8aln2/n)^/^. 

Proof. By the harder direction of Yao’s minimax lemma cni, it suffices to give, 
for any probability distribution D on X x Y, a [t — l,c-\- a,a,b]^ safe coinless 
quantum protocol P for / with average distributional error < e. To this end, 
we will first construct a probability distribution D* on A" x [n] x Y. By the 
easier direction of the minimax lemma, we will get a [t,c,a,b]^ safe coinless 
protocol P* for with distributional error, for distribution D*, e^, < 5. We 
shall construct the desired protocol P from the protocol P*. 

The distribution D* is constructed as follows. Choose i G [n] uniformly at 
random. Choose independently, for each j G [n], (xj,yj) G X xY according to 
distribution D. Set y = yi and throw away yj,j i. 

Let M be the first message of Alice in P*. By the definition of a safe pro- 
tocol, M has two parts. Mi a qubits long, and the “safe” overhead M 2 , c 
qubits long. Let the input to Alice be denoted by the classical random vari- 
able X = X 1 X 2 . . . Xn where Xi is the classical random variable corresponding 
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to the ith input to Alice. Define ^ to be the average error of P* under 

distribution D* when i is fixed and Ai, . . . , Ai_i are fixed to xi, . . . , Xi-\. From 
Propositions D El 0 using the fact that under distribution D* Ai , . . . , A„ are 
independent classical random variables, we get that 



2a /n > 



I{X : M) 
n 



= ■ M)\X„ . . . , A,_i = xi, . . . , cri_i)] 



Also 



i5 > e^. — Ei^ 



X 



X\ 

By two applications of Markov’s inequality, we see that there exists a choice of i 
and xi, . . . , Xi-i such that, if we define a new distribution D to be distribution 
D* where i is fixed to the above choice and Ai, . . . , Ai_i are fixed to xi, . . . , Xi-\^ 
then the error of the protocol P* on distribution D <26 and the mutual 



information between Xi and M under distribution D Ijj{Xi : M) < 4a jn. 

Consider now the protocol P' for the function / defined as follows. P' is 
a [t, c, a, 6]^ safe coinless quantum protocol. Alice is given x £ X and Bob is 
given y £ Y. Both Alice and Bob set i to the above choice (which is known 
to both parties) and Ai . . . Ai_i to the known values Xi . . . Xi-\. Alice puts an 
independent copy of a pure state \il)) for each of the inputs Aj+i, . . . , A„. She 
sets Xi = X and Bob sets his input Y = y. Then they run protocol P* on these 
inputs. Here \ip) = where Px is the (marginal) probability of x 

under distribution D. Since P* is a secure protocol, the probability that P' makes 
an error for an input (x,y), e^y, is the average probability of error of P* under 



distribution D with Xi,Y fixed to x,y. Hence, the average probability of error 
of P' under distribution D e£ = < 26. Also, because of the “secureness” 

of P* , we notice that the mutual information between Xi and the first message 
of P' (under distribution D) is the same as the mutual information between 
Xi and the first message of P* (under distribution D). Thus, if M denotes the 
first message of P' and A denotes the register Xi holding the input x, then the 
mutual information under distribution D, Id{X : M) < 4a jn. 

Since in protocol P' the first message of Alice has small mutual information 
with her input, we can give an argument similar to Nayak et al. uni, and finally 
get a [t — l,c + a,a,b]^ safe coinless quantum protocol P for / with e£ < 

+ 2((2 In 2)/£)(A : M))^/^ < 26 + 2{ia\n2 / nY / = e. For this we have to 
use the version of the “average encoding theorem” as in Theorem 0 instead of 
the version of m, which held for uniform probability distributions only. We 
observe, in the construction of P from P' , that though there is a overhead of 
a + c qubits on the first message of Bob, it is a “safe” overhead. The details are 
left to the full version m- 

This completes the proof of the round elimination lemma. □ 



From this lemma, we can prove the round elimination lemma in the form it 
will be used in various applications. 
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Lemma 3 (Round elimination lemma for fixed error). Suppose f: X x 
Y ^ Z is a function. There exist universal constants i?, C such that the fol- 
lowing holds: Suppose that the communication game has a [t, c, a, 6]"^ safe 

public coin quantum protocol with error probability at most 1/3. Then the com- 
munication game f has a [t — l,C{a + c),Ca,Cb]^ safe public coin quantum 
protocol with error probability at most 1/3. For example, R = 10^ and C = 51 
suffices. 

Proof. (Sketch) Repeat the [t,c,a,b]^ protocol for f(^°’'> C times in parallel 
and take the majority of the results. This brings the error probability down to 
a suitably small value. Now apply Lemma |3 on the repeated protocol. □ 



5 Quantum Lower Bounds for Predecessor 

In this section, we prove our lower bounds for the static predecessor problem in 
the address-only quantum cell probe model. The proof is essentially similar to 
the classical proof in Miltersen et al. |0|, and hence we give only a brief sketch. 

Theorem 2. Suppose we have a O {log m),f) quantum address-only cell 

probe solution to the static predecessor problem, where the universe size is m and 
the subset size is at most n. Then the number of queries t is at least l?(log^^^n) 
as a function of n, and it is at least l7(-\/loglogm) as a function of m. 

Proof. (Sketch) We basically imitate the proof of Miltersen et al jO], but in 
our quantum setting. By Lemma ^ it suffices to prove a lower bound on the 
number of rounds of a communication game. For that, we alternately use “self- 
reducibility” arguments and the round elimination lemma (Lemma 0) to keep 
reducing the number of rounds in the communication game. One just has to 
notice that the applicability of Lemma 0 does not depend on the “safe” overhead 
at all, but rather on the per round message complexity of the first player. This 
allows the quantum arguments to go through in a manner similar to the classical 
arguments, and hence, proves our theorem. □ 

Miltersen et al. also apply the round elimination lemma to prove lower bounds 
for other data structure problems and communication complexity problems. We 
remark that we can extend all those results in a similar fashion to the quantum 
world. 
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Abstract. We study complete axiomatizations for different notions of 
probabilistic bisimulation on a recursion free process algebra with prob- 
ability and nondeterminism under alternating and non-alternating se- 
mantics. The axioms that do not involve probability coincide with the 
original axioms of Milner. The axioms that involve probability differ de- 
pending on the bisimulation under examination and on the semantics 
that is used, thus revealing the implications of the different choices. 



1 Introduction 



Probabilistic process algebras have been studied extensively in the literature [ 1 1,11 
nibifSil dll b) . and classical concepts from concurrency theory have been extended 
to the probabilistic case. Probabilistic models of concurrent systems are classified 
in |Sj into reactive, generative, and stratified. Both in reactive and generative 
systems the transitions that leave from a state are equipped with probabilities: in 
generative systems the sum of the probabilities of the transitions that leave from 
a state is required to be 1, while in reactive systems the sums of the probabilities 
of the transitions that leave from a state and are labeled by the same action are 
required to be 1. The stratified model imposes some extra structure which is not 
relevant for the purpose of this paper. 

Motivated by the fact that neither reactive nor generative nor stratified sys- 
tems model real nondeterminism in the process algebraic sense, and motivated 
as well by the desire to separate clearly probability from nondeterminism, in 
fH a model of probabilistic automata is introduced and studied. Probabilistic 
automata, and more precisely the simple probabilistic automata of HH, are like 
ordinary automata (labeled transition systems) except that a transition leads 
to a probability distribution over states rather than to a single state. Thus, the 
choice between different transitions is a nondeterministic choice, while the choice 
of a state within a transition is a probabilistic choice. A similar model was pro- 
posed in based on the Concurrent Markov Chains of m- In such model, 
also known as the alternating model, there is a clear distinction between non- 
deterministic states, that enable only transitions leading to a unique state, and 
probabilistic states, that enable a unique transition leading to a distribution over 
states. There is a strict alternation between nondeterministic and probabilistic 
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states. Both the alternating model and the model of El, which in contraposi- 
tion to the alternating model is also known as the non-alternating models are 
conservative extensions of labeled transition systems and in several contexts can 
be seen as the same model from the point of view of expressiveness. 

Yet, the alternating and non-alternating models do have some differences 
that can be seen already when we study bisimulation relations. Probabilistic 
bisimulation was first defined in 0, then extended to the alternating model in 
0 and extended to the non-alternating model in m- While defining proba- 
bilistic bisimulation in the non-alternating model it was shown in m that we 
obtain two different relations if we simulate a transition using deterministic and 
randomized schedulers, respectively. Such difference does not appear in the al- 
ternating model unless we change the definitions of probabilistic bisimulations 
so that probabilistic states are not taken into account. 

In this paper we show the differences and similarities of the alternating and 
non-alternating models by analyzing the axiomatizations of the different bisimu- 
lation relations in the different frameworks. We define a process algebra without 
recursion and provide it with an alternating and non-alternating semantics. We 
then define a strong bisimulation relation that coincides with the relation of jS] 
in the alternating model and with the bisimulation of H2| in the non-alternating 
model. We also define the version of strong bisimulation, called strong proba- 
bilistic bisimulation, where a transition can be simulated by using randomized 
schedulers. Finally, we study the complete axiomatizations of all the relations 
that we introduce. Besides obtaining axioms where probability and nondetermin- 
ism are separated clearly, thus confirming the original goal behind the definitions 
of the models, we discover that the axiomatizations of strong bisimulation are 
the same in the alternating and non-alternating models. Furthermore, the ax- 
iomatizations of strong bisimulation and strong probabilistic bisimulation are 
the same in the alternating model, while they differ by an axiom that expresses 
the ability to combine transitions probabilistically in the non-alternating model. 

We also study the weak bisimulations of El, showing that the alternating 
and non-alternating semantics are incomparable. 

Other studies of axiomatizations for probabilistic bisimulation relations ap- 
pear in mm- Of these axiomatizations, only deals with a reactive model. 
The axiomatization of |S| includes recursion as well. 

The rest of the paper is structured as follows. Section|2lgives some preliminary 
definitions and notational conventions; Section 0 defines the Probabilistic Pro- 
cess Algebra (PPA) and its alternating and non-alternating semantics; Section 0 
defines the bisimulation relations that we axiomatize; Section 0 axiomatizes the 
relations of Section 0, discusses the axioms, and outlines the main ideas behind 
the proofs of completeness; Section 0 contains some concluding remarks. 

2 Preliminaries 

A discrete probability measure over a set Y is a function p, : 2^ — >■ [0, 1] such 
that p.{X) = 1 and for each countable family {Xi} of pairwise disjoint elements 
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of 2^ , ^{yJiXi) = Denote by Disc(X) the set of discrete probability 

measures over X. Given an element a: of X we denote by 5{x) the probability 
measure /i such that ^({x}) = 1, and we call it the Dirac measure on x. Given two 
measures and a real number p G [0, 1] we define the convex combination 

ppi + (1 — p)/i 2 of /ii and p 2 to be the probability measure /i such that, for each 
set Y, fi{Y) = ppi{Y) + (1 - p)p 2 {Y). 

A probabilistic automaton is a tuple {Q, q, S,D)^ where Q is a set of states, 
g G <5 is a start state, A is a set of actions, and D C QxExDisc{Q) is a transition 
relation. An ordinary automaton can be seen as a probabilistic automaton where 
each transition leads to a Dirac measure. Probabilistic automata are used as the 
basis to give an operational semantics to our probabilistic process algebra. 

3 Probabilistic Process Algebra 

We denote by A the set of observable actions or labels, and let Act = £ U {r} be 
the full set of actions. We call r the silent action and we let a range over Act. 

Let NProc denote the set of nondeterministic processes, ranged over by E, 
and PProc denote the set of probabilistic processes, ranged over by P. Finally, 
let Proc = NProc U PProc denote the set of processes, ranged over by Q. The 
syntax for our Probabilistic Process Algebra is given by the following rules: 

E ■.:= 0 \ E + E \ a.P 
P::=A{E)\P®pP 

The expression 0 is the inactive process having no transitions. The + opera- 
tor is the classical nondeterministic sum as defined in 0. Process a.P performs 
action a and then offers a probabilistic choice described by the probabilistic 
process P. A probabilistic process is either a Dirac distribution over a single 
nondeterministic process, described by A{E), or a combination of the distribu- 
tions associated with two probabilistic processes, described by the ©p operator. 

For notational convenience we can represent sums of nondeterministic pro- 
cesses by Ei and sums of probabilistic processes by Such rep- 

resentations are justified by the fact that in this paper both the operators + and 
©p turn out to be associative and commutative. We let p range over distribu- 
tions over nondeterministic processes and sometimes we represent a distribution 
over nondeterministic processes by {[pi]Ei}i^j. 

Note that PPA is characterized by a strict alternation between probabilistic 
and nondeterministic processes as in |^. The alternation is kept in the alternating 
semantics of the calculus and is removed in the non-alternating semantics. 

Table[I]contains the operational semantics of PPA, where E p describes a 
transition labeled by a that leaves from E and leads to a probability distribution 
p, while P I — > p states that the probability distribution associated with P is p. 
The rules of Table d describe the transitions of a probabilistic automaton; thus, 
the target of a transition of Table dis a probability distribution over expressions 
rather than a single expression. 
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Table 1. Operational semantics of PPA 



Probabilistic rules 



idle 



A{E) 1— 5{E) 



pchoice 



Pi I >■ fll P 2 I > P2 

Pi ©p P 2 I — >■ PMl + (1 - P)^J ■2 



Common nondeterministic rules 



Ichoice 



El 

El + E 2 — > 1-1 



rchoice 



E 2 ^tl 
El + E 2 — > 1-1 



P - idle 



Rule for non alternating model 



Rule for alternating model 



NA — prefix 



T~) ^ 

a.P — > fi 



A — prefix 



a.P S{P) 



Tabled is subdivided into three sections. The first section defines the prob- 
ability distributions associated with a probabilistic process. Specifically, process 
A{E) is associated with a Dirac distribution over the single process E (rule idle), 
while the probability distribution associated with the probabilistic combination 
Pi ©p P2 is obtained by convex combination weighted by p of the distributions 
associated with Pi and P2, respectively (rule pchoice). The second section of 
Table □ describes the operators whose semantics does not change in the alter- 
nating and non-alternating interpretations. Specifically, the semantics of the + 
operator is the same as in CCS (rules Ichoice and rchoice). Rule P-idle de- 
scribes the unique transition that is enabled from a probabilistic process, which 
moves silently to the distribution associated with the process. This rule is essen- 
tial in the alternating semantics, where probabilistic processes can be reached; 
however, the same rule is convenient also in the non-alternating semantics to 
obtain an axiomatization of probabilistic bisimulation that reveals better the re- 
lationship between the two semantics. The third section of Table [D contains the 
rules for action-prefixing, which constitute the key difference between the alter- 
nating and non-alternating semantics. In the non-alternating semantics process 
a.P moves with action a to the distribution identified by P (rule NA-prefix), 
while in the alternating semantics process a.P moves with action a to process 
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P (rule A-prefix) from which a silent move leads to the distribution identified 
by P (cf. rule P-idle). 

Remark 1 . There is a folklore idea of how an alternating system can be trans- 
lated into a non-alternating system and vice versa. Specifically, to move from an 
alternating system to a non-alternating system it is sufficient to remove all the 
probabilistic states and collapse the transitions that go through a probabilistic 
state, while to move from a non-alternating system to an alternating system 
it is sufficient to split each transition into two transitions, the first of which 
leads to a probabilistic state. The operational semantics of Table ^ respects the 
folklore transformation: for each process E the transformation of its alternating 
semantics coincides with its non-alternating semantics and vice versa. 

4 Bisimulation 

In this section we define bisimulation relations in the strong and weak version 
based on deterministic and randomized schedulers. In the non-alternating model 
our definition of strong and weak (probabilistic) bisimulation coincide with those 
of | I 1 in the alternating model strong bisimulation coincides with the strong 
bisimulation of , while weak probabilistic bisimulation coincides with the weak 
bisimulation of m- 

4.1 Lifting Equivalence Relations 

An equivalence relation over Proc can be lifted to a relation over distributions 
over Proc by stating that two distributions are equivalent if they assign the same 
probability to the same equivalence classes |H|. 

Formally, let TZ be an equivalence relation over Proc. Two probability distri- 
butions fj.1 and fj-2 are 72 .-equi valent, written fj, TZp fj,', iff for every equivalence 
class £ £ Proc/ TZ we have n{£) = fJ-'{£). 

4.2 Strong Bisimulation 

An equivalence relation TZQ Proc x Proc is a strong bisimulation iff, for all 
Qi, Q2 £ Proc such that Q\ TZ Q2, and for all a £ Act, 

— if Qi fj.1, then there exists ^2 such that Q2 M2 and mi T^p M2; 

— if Q2 M2, then there exists mi such that Qi mi and mi TZp M2- 

We write Q\ ^ Q2 whenever there is a strong bisimulation that relates Qi, Q2- 

Proposition 1. Strong bisimulation is a congruence in PPA. 

In a strong bisimulation a transition of a process must be simulated by a single 
transition of the other process chosen deterministically among the transitions 
that are enabled. It was observed in ini that deterministic schedulers may not 
be enough in a randomized setting. 
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Example 1 . Consider ill = a.{A{Ei) A{E2)) + a.{A{Ei) A{E2)) dxid 

F = a.{A{Ei) (B\/2 A{E2)) + a.{A{Ex) (B^/12 A{E2)) + a.{A{Ei) (Bi/2, A{E2)) 
whose non-alternating semantics is represented in Figure^ Each bundle of edges 
corresponds to a transition. The difference between E and F is that F enables 




F 




Fig. 1. Two processes not strongly bisimilar 



an additional transition which is obtained by combining probabilistically the two 
transitions of E. There is no strong bisimulation between E and F if Ei and E2 
are not bisimilar; however, E and F would be bisimilar if we permit the use of 
randomized schedulers to simulate the extra transition of F. 

Example G] suggests a new bisimulation relation where it is possible to com- 
bine several transitions labeled by the same action in a unique transition. We 
say that there is a combined transition labeled by action a from a process if to a 
distribution p,, denoted by E — >c iff there exists a collection of dis- 

tributions and probabilities such that ^Pi = 1 , p = ^PiPi, and Vi : E 

An equivalence relation TZQ Proc x Proc is a strong probabilistic bisimulation 
iff, for all Qi,Q2 G Proc such that Qi TIQ2, and for all a S Act, 

— if Qi -AE). then there exists p2 such that Q2 —Ac P2 and p\ TZp p2', 

— if Q2 -Aa P2, then there exists pi such that Qi A^q Pi and p± TZp p2- 

We write Q\ Q2 whenever there is a strong probabilistic bisimulation that 
relates Q\ and Q2- 

Proposition 2. Strong probabilistic bisimulation is a congruence in PPA. 

It is easy to observe that strong bisimulation is just a particular case of 
strong probabilistic bisimulation. An important result is that in the alternating 
semantics strong bisimulation coincides with strong probabilistic bisimulation 
(cf. Proposition E| • Thus, randomized schedulers do not add any extra power 
to the ability of simulating a transition. Roughly speaking, in the alternating 
model each probability distribution is declared explicitly through a probabilistic 
state before being drawn. Strong bisimulation must preserve the declarations as 
well, and on the other hand there is no way to declare the combination of two 
transitions. 

Proposition 3. Under the alternating semantics a strong probabilistic bisimu- 
lation is also a strong bisimulation. 
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Table 2. Weak transitions 



E 

E 




El 




E ■ 



5{E) 






E : 






Ei^fiEi ■ /■ lli 






Ei£fi 



E^Y 

BiSM 



Proof sketch. Let be a strong probabilistic bisimulation and suppose Q\ 

Q2- If Qi and Q2 are probabilistic processes, then they enable only one transition, 
the silent transition that selects probabilistically one process. Thus, there is 
nothing to combine. If Q\ and Q2 are nondeterministic processes and Q\ — > /r, 
then /i is a Dirac distribution over some probabilistic process P. The combined 
transition Q2 — >c that simulates Q\ — > ^ leads to a distribution that 
assigns probability 1 to the equivalence class of P. Thus, any transition from Q2 
that contributes to Q2 ~^c e' leads to distribution that assigns probability 1 
to the equivalence class of P. This shows that Qi ~ < 52 - 

4.3 Weak Bisimulation 

Weak bisimulation is the same as strong bisimulation except that we replace 
transitions by weak transitions. That is, we are not interested in observing the 
silent behavior of a system. A weak transition, whose formal definition is given 
in Table El is a probabilistic extension of the weak transitions of Pj . We schedule 
several transitions as long as they always lead to the occurrence of a single exter- 
nal action a, possibly interleaved by silent actions. For notational convenience, 
given a sequence s of actions in Act, we denote by s the sequence obtained from 
s by removing all r’s. 

An equivalence relation TZQ Proc x Proc is a weak bisimulation iff, for all 
Qi,Q2 G Proc such that Qi TZ Q2, and for all a G Act, 

— if Qi jj,^ then there exists ^2 such that Q2 H2 and fii TZp 112] 

— if Q2 fJ-2 then there exists /ii such that Qi fii and fii TZp ^2- 

We write Qi ~ Q2 whenever there is a weak bisimulation that relates Q\ and 

Q2- 

We can define a weak combined transition relation (=>c)i as we have done 
in the strong case, by combining simple weak transitions. Thus, it is possible 
to define weak probabilistic bisimulation by replacing weak transitions by weak 
combined transitions in the definition above. 
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4.4 Observation Congruence 

As in ordinary CCS j0| , weak bisimulation is not preserved by the nondetermin- 
istic choice operator +. The classical example is given by the pair of processes 
a.O ~ T.a.O, which are equivalent both according to weak bisimulation and weak 
probabilistic bisimulation, where a.O + 6.0 76 r.a.O + 6.0. Following the classical 
approach of 0, we define observation congruence and probabilistic observation 
congruence. 

Two processes Q\, Q2 are congruent, written Q\ — Q2, if Qi and Q2 are both 
nondeterministic or both probabilistic, and for all a £ Act, 

— if Qi -AA fj,i then there exists p.2 such that Q2 p.2 and ~ p.2 

— if Q2 — > then there exists /ii such that Qi => pLi and pLi m p.2 

Two processes Qi,Q2 are probabilistically congruent, written Qi =c Q2, if 
Qi and Q2 are both nondeterministic or both probabilistic, and for all a £ Act, 

— if Qi -AA Hi then there exists p,2 such that Q2 H2 and Hi ~C M2 

— if Q2 H2 then there exists Hi such that Qi hi and Hi M2 

The only difference between congruence and weak bisimulation is that in the 

former there is instead of This implies that every r-transition of Qi 
is related with at least one r-transition of Q2, and vice versa. Observe that this 
strong relationship is requested only for the first transitions of both Qi and Q2' 
in fact, it is sufficient that Hi M2, not hi =p M2- 

Proposition 4. The relations = and =c are congruences in PPA. 

5 Axiomatizations 

5.1 Discussion of the Axioms 

The axioms that characterize completely the bisimulation relations of this paper 
are listed in Table 0 The left side of Tabled contains the axioms for the non- 
alternating semantics of PPA, while the right part contains the axioms for the 
alternating semantics of PPA. Table 0is also subdivided into four horizontal sec- 
tions. The first and third sections axiomatize strong bisimulation. By adding the 
second section we obtain complete axiomatizations for observation congruence. 
Finally, by adding the fourth section we obtain complete axiomatizations for the 
probabilistic versions of our bisimulations, where axiom CW holds only for the 
weak relations. Thus, sections 1, 3 and 4 provide complete axiomatizations for 
the strong probabilistic bisimulations. 

Observe that there is no C axiom in the right column of Table 0 which 
confirms that strong bisimulation is the same under randomized and non- 
randomized schedulers in the alternating semantics. Furthermore, there is no 
CW axiom in the left column of Table 0 which shows that randomization adds 
some restricted power to the ability of simulating a weak transition in the alter- 
nating model. Axiom CW does not hold in the non-alternating semantics since 
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the term t.P reached after the a-labeled transition of a.(P+ A{t.P)) cannot 
be simulated in general by the distribution identified by P in a.P. See also the 
discussion about axiom A8. 

Observe that the first and third sections of Table|3contain the same axioms in 
the two columns. This confirms that under strong bisimulation with deterministic 
schedulers the alternating and non-alternating models are indeed the same. We 
can observe a difference between alternating and non-alternating semantics in 
the second section of Table El Specifically, axioms A6-7 of the right column are 
more restrictive than the axioms of the left column {Pi replaced by P). On the 
other hand, axiom A8 holds only in the alternating semantics, thus showing that 
weak bisimulations are incomparable. Axiom A8 expresses the informal idea that 
in the alternating model each distribution must be declared before being drawn. 
Thus, adding further declarations does not matter. The left version of axiom A5 
can be replaced by its right version. We have kept both versions to illustrate 
better the analogies with the r-laws of Milner. 

Another important observation is that the axiomatizations of Table 0 keep 
most of the structure of the axiomatizations for ordinary CCS 0. The axioms 
of the first section are exactly the axioms for strong bisimulation on CCS, and 
the axioms of the third sections add the ingredients that are need for the new 
probabilistic choice operator. The r-laws of the second section have the same 
structure of the r-laws of Milner, except that within a prefix we have the proba- 
bilistic choice operator. If we consider processes without the probabilistic choice 
operator, then our r-laws coincide with the r-laws of Milner. 

5.2 Proof Sketches 

The proofs of the completeness results are similar to the corresponding proofs 
for CCS a process is reduced to a normal form, possibly saturated, and then 
processes are compared almost syntactically piece by piece. In this section we 
give an overview of the normal forms that are needed in the proofs. 

Definition 1. A nondeterministic process E is in normal form (NF) if 

iei jeJi 

where the processes Ej are in normal form as well. 

Getting a process in normal form is almost immediate since it is sufficient to 
remove all exceeding O’s by using axiom A4 and the congruence rules. 

Definition 2. A nondeterministic process E is in strict normal form (SNF) if 

E WE] 

iel j&Ji 

where if S \~ Ej = E^,, then j = f. With S we denote the axioms of 

the first and third sections of Table\^ 
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Table 3. Axioms for strong and weak bisimnlations 





Non alternating semantics 


Alternating semantics 


A1 

A2 

A3 

A4 


E F = F E 
ef{ffg) = {eff)fg 
E E = E 
E 0 = E 


E E E = E E 
ep{eeg) = {epf)eg 
E E = E 
E 0 = E 


A5 


a.{A{T.A{E)) ©p P) = a.{A{E) ©p P) 


A{t.A{E)) = A(E) 


A6 


r. ^ [pi]{Ei + a. Pi) + a. ^ [pi]Pi = 

i€l iGl 

r. ^ [Pi]{Ei + a. Pi) 
iGl 


r. ^ [pi](Ei + a.P) + a.P = 

iGl 

[pi]{Ei a.P) 

iGl 


AT 


a- [Pi](Ei + T.Pi) + a. [Pi]Pi = 

iGl iGl 

a. ^ [pi]{Ei + T.Pi) 
iGl 


a. ^ [pi]{Ei + T.P) + a.P = 
i€l 

a. ^ [pi]{Ei + T.P) 

iGl 


A8 


- 


a.P = a.A{r.P) 


PI 


p©pQ = g©(i_p)P 


p©pQ = g©(i_p)P 


P2 


P©pi(Q©^^ R) = 

1-pi 

(P© PI Q) ©rpi-i-poi P 

P1+P2 


p©pi(g©_j^ R) = 
(P© PI g) ©rpi-i-poi P 

P1+P2 


P3 


P(BpP = P 


p©pp = p 


C 


a. Pi + a.P 2 = a. Pi + Q.P 2 + a. {Pi ©p P 2 ) 


- 


CW 


- 


a.(P © A{t.P)) = a.P 



To get a process in strict normal form we first convert the process to normal 
form. Then, whenever we find two elements Ej and E^, that are provably equiv- 
alent, we use axiom P3 to collapse them. Of course we need also axioms PI and 
P2 to get the two terms next to each other. 

Processes in strict normal form are sufficient for the proof of completeness 
for strong bisimulation that works prefix by prefix as in 0. To handle strong 
probabilistic bisimulation we use axiom C to build the missing summands that 
originate from convex combinations of other summands. Thus, we reduce strong 
probabilistic bisimulation to strong bisimulation. 
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To deal with weak bisimulation we need to saturate a process as in j0| . For 
this purpose we define complete normal forms. 

Definition 3. A nondeterministic process E is in complete normal form ( CNF) 

if 

= WE] 

iei jeJi 

— the processes E)j are in CNF 

— if E p,, then E p. 

The saturation process to get an expression in complete normal form consists 
of using axiom A6 to move out of a T-prefix each transition labeled by some 
external action. The final step is to get a strict complete normal form in the same 
way as we do for strong bisimulation. When axiomatizing weak probabilistic 
bisimulation, once again we use axiom C to create the missing summands. 

The normal form for weak bisimulation in the alternating semantics differs 
from the normal form in the non-alternating semantics in that axiom A6 allows 
us to saturate only those transitions that lead to Dirac distributions. 

Definition 4. A nondeterministic process E is in alternating complete normal 
form (ACNF) if 

— 5; [p*]B* 

iG/ tGJi 

— the processes E) are in ACNF 

— if S{P), then E ^ 5{P). 

6 Concluding Remarks 

We have studied axiomatizations of bisimulation relations for a recursion free 
fragment of a probabilistic process algebra that includes probabilistic and non- 
deterministic choices. Our analysis included strong and weak bisimulation, deter- 
ministic and randomized schedulers, alternating and non-alternating semantics. 

The axioms have a structure consistent with the original axioms of Milner and 
separate clearly the concerns of nondeterminism and probability. The axiomati- 
zations that we have found also highlight the main differences and similarities of 
the alternating and non-alternating models of concurrent probabilistic systems. 

We are currently planning to extend our axiomatizations to a probabilistic 
process algebra with recursion and parallel composition. We do not expect any 
special surprises with parallel composition since a probabilistic generalization of 
the expansion law of Milner is easy to derive. 
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Abstract. We propose a type system to ensure the property of nonin- 
terference in a system of concurrent programs, described in a standard 
imperative language extended with parallelism. Our proposal is in the 
line of some recent work by Irvine, Volpano and Smith. Our type system, 
as well as our semantics for concurrent programs, seem more natural and 
less restrictive than those originally presented by these authors. More- 
over, we show how to adapt the type system in order to preserve the 
noninterference results in the presence of scheduling policies, while re- 
maining in a nonprobabilistic setting. 



1 Introduction 

The aim of this paper is to study the notion of secure information flow, and 
more specifically of noninterference (a notion first introduced by Goguen and 
Meseguer in in the setting of concurrency. Our starting point is the pa- 
per m by Volpano, Smith and Irvine, and the subsequent paper m by Smith 
and Volpano, where noninterference is enforced by means of a simple type system 
in an imperative language with security levels. The language considered in m 
is purely sequential, and is extended in H2| with asynchronous parallelism (in- 
terleaving). In this introduction, and in the examples given in the paper, the 
security levels will simply be high and low. High-level variables are supposed to 
contain secret information, while low-level variables contain public information. 
However all results will be given for an arbitrary lattice of security levels. 

In Volpano et al.’s work, noninterference means that variables of a given level 
do not interfere with those of lower levels: more precisely, the values of low-level 
variables are not dependent on the values of high-level variables. Noninterference 
is meant to model the absence of information flow from high level to low level. 
Such information flow is considered insecure, as it amounts to the disclosure of 
secret information into the public domain. Insecure flow can be explicit, when 
assigning the value of a high variable to a low variable, or implicit, when testing 
the value of a high variable and then assigning to a low variable, for instance. 
In the approach of ITM^ . these situations are prevented by means of a type 
system. More precisely, explicit flow is prevented by requiring that the level of the 
assigned variable be at least as high as that of the source variable, while implicit 
flow is prevented by asking that the level of the commands in the branches of a 
conditional (the level of a command being that of its lowest assigned variables) 

* Research partially funded by the EU Working Group CONFER II and by the french 
RNRT Project MARVEL. 

F. Orejas, P.G. Spirakis, and J. van Leeuwen (Eds.): ICALP 2001, LNCS 2076, pp. 382-^^^ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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7 : 


if PIN — 0 then 


tp := 


tt 


else 


to. 


:= tt 


a : 


while ta 7 ^ tt do 


nil ; 


r 


:=0 ; 




:= tt 


P- 


while tp 7 ^ tt do 


nil ; 


r : 


:= 1 ; 


to. 


:= tt 



PIN : boolean variables of type H 
r : boolean variable of type L 
7 : thread of type H, a, 13 : threads of type L 

Fig. 1. Information Flow through Control Flow 



be at least as high as that of the tested variable. Implicit flow can also arise in 
while-loops, and is prevented by a similar condition on the type of the body of 
the loop. 

In fact, because of while-loops, the definition of noninterference is more pre- 
cise than what is stated above: it says that no change in the values of low-level 
variables should be observed as a consequence of a change in high-level variables, 
provided that the program terminates suecessfully. Using subscripts to explicitly 
indicate the security level of a variable, consider the following program, that 
terminates if xh ^ 0 and loops forever (doing nothing) otherwise: 

while xn = 0 do nil ; j/l := 1 ( 1 ) 

Should this program be accepted, that is, should it be typable? According to 
the above definition of noninterference the answer is “yes”, since whenever the 
program terminates it produces the same value 2 /l = 1 for its low-level variable. 
Indeed, this program is typable in Volpano and Smith’s type system, since the 
loop is typable and the sequential composition of typable programs is always 
typable. 

However, accepting such a program leads to problems when parallelism is 
introduced in the language. These problems can be concisely described as “dis- 
guising information flow as control flow” . Let us illustrate the problem by means 
of an example, which is a simplified version of the FIN example given by Smith 
and Volpano in H2|. In this example, given in Figure Q three threads a,/3 and 
7 are run (asynchronously) in parallel. There are four variables, a high-level 
variable PIN tested by thread 7 , two high-level variables ta and t /3 serving as 
“triggers” for threads a and (3, and a low-level variable r written by a and (3. As 
can be easily seen, with initial values ta = tp = ff the effect of the program is to 
copy the value of the secret variable PIN into the public variable r. The illicit 
information flow from PIN to r is implemented through the control flow from 7 
to a or (3. However, if we assume that a system of concurrent threads is typable 
provided each component is typable, this particular system is to be accepted. 

To circumvent this problem, Smith and Volpano propose in m to forbid the 
use of high-level variables as guards in while-loops, that is, assuming that there 
is a lowest security level, to accept only while-loops of low level. While ruling 
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out the program in (P) , and also the threads a and f3 of the PIN example, this 
solution seems a bit drastic. It excludes inoffensive programs such as while xh = 
0 do nil. We shall propose here a different solution to the problem raised by 
while-loops in the presence of parallelism, which allows this program to be typed, 
while ruling out the programs of example (^) and FigureD Our solution is based 
on the observation that a program such as 

while Xh = 0 do nil 

should indeed be considered with some care in a concurrent setting, but only as 
a “guard”, that is, as regards what may follow it. In the context of concurrent 
threads, if the control comes back to this while loop, this may be with a value for 
Xh different from 0, contrarily to what happens in a sequential setting. In other 
words, this program may observe the behaviour of other, concurrent components, 
in the course of their execution, and influence accordingly the behaviour of the 
thread in which it participates. Technically, this means that we will abandon the 
hig-step semantics which is the basis of Volpano et al.’s analysis in favor of a small 
step semantics for programs, which is the approach usually adopted in dealing 
with parallelism. Our aim is then to ensure a stronger form of noninterference, 
where the course of values - not just the final value - of a low-level variable does 
not depend upon the value of high variables. Typically, the program o is no 
longer interference-free in this stronger sense. In order to reject it, we introduce 
a refinement of the type system, where the level of a guard - the expression 
tested by a while loop ~ is taken into account in sequential composition. 

We will also examine the situation where a scheduling policy is in force in a 
thread system: we will introduce a few new programming primitives to describe 
formally such a situation, and show how to adapt the type system for this new 
setting, where new interference phenomena arise. As can be expected, this will 
result in a slight restriction on the type of certain programs, though not as severe 
as that prefigured in m 

The rest of the paper is organised as follows. In Section 0 we introduce the 
language, its operational semantics and its type system. Section 0 presents the 
properties of typed programs, including subject reduction and noninterference. 
Finally, in Section 0we consider the extended language with scheduling policies. 
The proofs are omitted from this extended abstract. They are to be found in the 
full version of the paper | 2 |. 

2 The Language and Type System 

The language we consider is essentially that of ^21 (where e stands for a boolean 
or arithmetic expression, whose syntax we do not detail here). We use the follow- 
ing two-level syntax, where U, V denote sequential programs, while P, Q denote 
general (concurrent) programs: 



U, V . . . nil | x := e \ U;V \ if e then U else V \ while e do U 

P, Q U I U] P I if e then P else Q \ while e do P | P \\ Q 
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Note that on the left of a sequential composition, we must have a sequential 
program. Thus programs of the form (P || Q);R are not allowed. With this 
restriction, our language is still more general than that of which describes 
concurrent systems as collections of threads, thus allowing only top-level paral- 
lelism, while we allow the dynamic spawning of new threads. 

The operational semantics of the language is given in terms of transitions 
between configurations (P, /r) (P', fi') where P, P' are programs and fx, 

stand for memories, that is mappings from variables to values. These mappings 
are extended in the obvious way to expressions, whose evaluation is assumed to 
be atomic as in ng. We use the notation p\v/x] for memory update. The rules 
specifying the operational semantics of programs are presented in Figure 0 As 
pointed out already in the introduction, the semantics used here is a small step 
semantics, as opposed to the big step semantics of C20. The rules are fairly 
standard, and we shall not comment on them. 

In the introduction we argued that, in a small-steps semantics, the pro- 
gram m should be treated as another case of implicit information flow. Intu- 
itively, when exiting a loop one gets some information about its guard; it seems 
then appropriate to require that what follows the loop ~ its “continuation” - 
have level at least as high as that of the loop guard. This will be the basic idea 
of our new type system, which is closely inspired by that given by Volpano et 
al. in m - however as suggested by the above example it will be more restrictive 
than that of HSl on the sequential sublanguage, because of our more detailed 
observation of programs. 

The types of data and expressions are security levels, that is elements of a 
lattice (5, <). We denote the operations of meet and join respectively by □ and U. 
These types are ranged over by r, a. In the examples, the lattice of security levels 
will simply be {L, H}, with L < H. The types of variables (when used in the left- 
hand side of an assignment) are of the form r var. Our first point of departure 
from m concerns the types for programs. Type judgements in m are of the 
form r \- P : T cmd, where T is a mapping from variables to types of variables, 
i.e. elements of {rvar \ t S 5}. The meaning of T h P : r cmd is that in the 
type environment P, t is a lower bound for the level of the assigned variables 
of P. In line with this intuition, subtyping for programs is contravariant, that is 
r cmd < t' cmd if r' < r. Thus for instance any program of type H cmd can 
be downgraded to type L cmd. A program of type H cmd is guaranteed not to 
contain any assignment to a low variable. 

To take into account loop guards, we shall use here more refined types 
(r, cr) cmd, where the first component r plays the same role as in the type t cmd, 
while the second component a is the guard type, an upper bound on the level of 
the loop guards occurring in a program. Accordingly, the subtyping for programs 
is contravariant in its first component and covariant in the second: 

(r, cr) cmd < (t' , a') cmd if t' < t and a < a' 

^ In fact, the semantics of H2| is a mixture of small and big step semantics: tran- 
sitions are given between configurations but there are two kinds of configurations, 
intermediate and final ones, suggesting that termination should be observed. 
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(Assign-Op) 

(Seq-OpI) 

(Seq-Op2) 

(Cond-OpI) 

(Cond-Op2) 

(While-OpI) 

(While-Op2) 

(Par-OpI) 

(Par-Op2) 



{x := e,/i) ->■ (nil,^[^(e)/a;]) 

(nil;P, fj.) (P',p') 

/i(e) = tt 

(if e then P else Q,fJ,) — >■ {P, jx) 

/i(e) ^ tt 

(if e then P else Q,fJ,) — >■ (Q,^) 
fi{e) = tt 

(while e do P, — >■ (P; while e do P, fi) 

M(e) 7^ tt 

(while e do P, /i) — ^ (nil, /i) 

(p,m)^(p',m) 

(P II II Q,m') 

(Q, /i) — >• (Q', ^') 

(P II Q,fi)^{P II Q',m') 



Fig. 2. Operational Semantics for Parallel Programs 



The guard type will be set up by while-loops and looked up by sequential compo- 
sition. The complete type system for programs is shown in Figure 0 Notice that 
the guard type plays no particular role in rules (Nil), (Assign) and (Cond), 
which are plain adaptations of the ones in Let us comment a little on the 
rules for while-loops and sequential composition, which are the main novelty 
w.r.t. psini. As explained, the guard type is cr for a while-loop testing an ex- 
pression of level tr, and from then onwards it should stay equal to a to prevent 
concatenation with low-level programs. Rule (Seq) is precisely designed to avoid 
sequencing “low” assignments after a program with “high” guards. This rules 
out the kind of implicit flow exhibited by the program 0 . One may notice that 
rule (While) imposes types of the form (r, t) cmd to while-loops (by subtyping 
they also have types {9, a) cmd with 9 < a). We let the reader check that, had 
we accepted for instance {H, L) cmd, we would not avoid interferences, as shown 
by the example 
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(Nil) ^ ^ 

r h nil : (r, a) and 

r \- e : r, r{x) = rvar 

(Assign) 

r h X := e : {t, a) crad 

-T h U : {t, a) and, F \- P : {t' ,o') and, a <t' 

(Seq) 

r h f/ ; P : (r n r', CT U a') and 

r \- e : T, r \- P : {r, a) and, P \~ Q : {t, a) cmd 

(Cond) 

P h if e then P else Q : (t, a) cmd 

P \- e : T, P \- P : (t,t) cmd 

(While) 

P h while e do P : (r, r) cmd 

P \- P : {r, a) cmd, P \- Q : {t, a) cmd 

(Par) 

P h P \\ Q {t, a) cmd 

P \- P : {t, a) cmd, t' < t, a < a' 

(Subtyping) 

P h P : (r'j a') cmd 



Fig. 3. Typing Rules for Concurrent Programs 



if xh = 0 then while yL = 0 do nil 

else nil ; (2) 

ul ■= UL + l 

Similarly, we have to rule out the insecure program 

while tt do (yu := 2 /l + 1; while xh = 0 do nil) (3) 

and this shows why loops having (L, H) cmd as their unique type should be 
forbidden. 

3 Properties of Typed Programs 

In this section we prove some desired properties of our type system. The first 
property, subject reduction, states that types are preserved along execution. 

Theorem 3.1. (Subject Reduction) 

If r \- P : (T,a) cmd and {P,y) — t {P',y'), then P \~ P' : {r,a) cmd. 

Proof: By induction on the inference of P h P : (r, cr) cmd, and then case 
analysis on the last rule used in this inference. □ 




388 



G. Boudol and I. Castellani 



We shall use the following assumptions about expressions: 

Assumption 3.2 (Termination of Expression Evaluation) 

For any memory fj, and expression e, the value /r(e) is defined. 

Assumption 3.3 (Simple Security) 

If r h e : T, then every variable occurring in e has type t' var in F, with t' < t. 

We introduce now, for any type environment F and security level w, a notion 
of equality on memories which formalises the idea that two memories coincide 
on variables of level less than or equal to a; in T. Intuitively, such memories are 
indistinguishable for an observer of level lo. 

Definition 3.4 (w-Equality of Memories) 

pL V <J4>def Vcc. F(x) = Tvar & r < w p,{x) = v{x). 

Definition 3.5 ((T, a;)-Bisimulation) 

A relation TZ on configurations is a (T, w)-bisimulation if (P, p) TZ {Q, v) implies 

(i) p=fiv 

(ii) {P,p)^{P’,p') ^ A {P',p')TZ{Q',v') 

(iii) ^ 3P',p'.{P,p)^* {P’,p') A {P',p')TZ{Q’,y') 

The {F, to) -bisimulation equivalence on configurations, noted is the largest 
(P, w)-bisimulation. 

The following two lemmas confirm the intuition, discussed earlier, behind the 
type judgements P h P : (r, ct) cmd. 

Lemma 3.6 (Confinement) 

If F h P : {t, a) cmd then every variable assigned to in P has type 9 var in P, 
with T < 9. 



Lemma 3.7 (Guard Safety) 

If F h P : (r, (j) cmd then every loop guard in P has type 9 in P, with 9 < a. 



Definition 3.8 (w-Boundedness) 

A program P is u-bounded if P h P : {t, a) cmd implies t < u>. 



Definition 3.9 (w-Guardedness) 

A program P is ui -guarded if there exist r, cr, with a < oj, such that P h P : 
(r, cr) cmd. 

Note that by the Confinement Lemma, a program which is not uj-bounded cannot 
write on variables of level less than or equal to w. Similarly, by the Guard Safety 
Lemma, a program which is u-guarded does not contain loop guards of level 
higher than or incomparable with w. As a consequence of subject reduction, 
both non-w-boundedness and w-guardedness are preserved by execution. 




Noninterference for Concurrent Programs 



389 



Proposition 3.10 (Bisimilarity of Non w-Bounded Programs) 

Let be the relation eonsisting of the pairs {{P, /i), {Q, ly)) sueh that p, v 
and there exist r, a and r', a' with t ^ to, t' ^ to, such that P h P \ cmd 
and P \- Q : (T',a') cmd. Then is a {P,uj) -bisimulation. 

Proof: Let {{P, p) , {Q , v)) G S^’‘^ and {P,p) — {P',p'). This can be matched 
by {Q, ly) — >■* (Q, v), since by the Confinement Lemma p' =‘f, p=p v, and by the 
Subject Reduction Theorem T h P' : (r, cr) . □ 

Note that (P, w)-bisimilarity does not preserve termination. For instance, for any 
memories p and v such that p =‘f v we have: 

(nil, /i ) Ri'jf (while tt do nil, 

We introduce now a notion which will play a key role for noninterference. 

Definition 3.11 (w-Constrainment) 

A program P is to -constrained if there exist r, cr, with t ^ w and cr < uj, such 
that P h P : {t, a) cmd. 

By definition any w-constrained program is both w-guarded and not w-bounded. 
It is worth stressing that the converse is not true, as shown by the program 
while tt do nil. Clearly, for any type environment P, this program is w- 
guarded for any security level oj and not w-bounded if w T. However it is not 
w-constrained, as a consequence of the uniform typing in rule (While). Indeed, 
an important property of cj-constrained programs is the following. 

Lemma 3.12 (Termination of w-Constrained Sequential Programs) 

If U is UJ- constrained, then for any p there exist p',U' such that (U,p) —I* 
(P', p') and U' = nil ; • • • ; nil. 

Finally we can state our main result: 

Theorem 3.13. (Noninterference) 

If P is typable in P, then (P, p) (P, v) for any p, v such that p =f v. 

Proof: We define inductively the relation TZq’‘^ on configurations as follows: 
(P, p) Pg (Q, n) if and only if P and Q are typable, p =p v and one of the 
following holds: 

1 . {P,p)^f{Q,v) 

2. P = Q and P is w-bounded 

3. P = U; R and Q = V;R, where both U and V are w-constrained 

4. P =U\R and Q = V;R, where (P, p) Pg’“ {V, v) and R is not w-bounded 

5. P is not w-bounded and Q = V]R, where (nil, p) Pg’“ {V, v) and R is not 
w-bounded (or symmetrically) 

6. P = Pi II P 2 and Q ^ Qi \\ Q 2 with (Pi, p) Pg’“ (Qi, v). 

We show that is a (P, tu)-bisimulation. The theorem will be a consequence 
of this fact, since if P is typable, then either P is not w-bounded, in which case 
(P, p) (P, ly) by Proposition 13.101 or P is w-bounded and (P, p) Pg (P, n) 
by the second clause of the definition. In the case of Clause 3, we use the 
Lemma 13. 1 21 □ 



390 



G. Boudol and I. Castellani 



4 Adding a Scheduler 

As pointed out by Smith and Volpano in noninterference results such as 
those of the previous section rely on the hypothesis of a purely nondeterministic 
execution of concurrent programs. These results would break down if particular 
scheduling policies were enforced. We recall the example given in m Assume 
a round robin time slicing scheduler, with a time slice of t steps, t > 2, and 
consider the composition P = a \\ (3 of the following two threads: 

a : if Xu = 0 then Q else nil ; 

VL ■■= 0 (4) 

/? : 2/L := 1 

Then, supposing that Q is a convergent program that takes at least t — 1 steps 
to execute, and that the scheduler gives precedence to a, the value oi yr will 
depend on that of Xu- The solution proposed in H21 to preserve noninterference 
in the presence of an arbitrary scheduler consists in forbidding conditionals with 
high guard^, that is, again assuming that there is a lowest security level, to 
accept only conditional branching on low level expressions. This condition, com- 
bined with the exclusion of loops with high guards, required for multi-threading, 
resulted in H2] in a very severe limitation: the impossibility for any program to 
test a variable, except at the lowest level. 

We present here a different solution for scheduling, which does not rule out 
conditionals with high guards. To this end we first formalise what it means for 
a system of concurrent programs to be controlled by a scheduler. Essentially, 
this means running the system in lockstep with a program that implements the 
scheduling policy. To describe controlled execution, we use a construction P[Q], 
which makes P and Q move hand in hand, but allows the controller, P, to 
proceed by itself whenever Q is unable to move. Then a system consisting of n 
parallel programs Pi controlled by a scheduler Sched will be described as: 

Sched [Pi' II •••11^;.] 

where the P' are adaptations of the Pi, so that the processes can be triggered 
and suspended by the scheduler. To this end we introduce a new construct 
when e do P, whose semantics is that P is allowed to proceed, for one step, 
when the condition e holds. It is technically convenient to introduce another 
level in the syntax: besides the programs P, written according to the grammar 
given in Section |2 there is a set of “systems” S,T built as follows: 

S' ::= P I S' II T I S[T] \ when e do S 

Letting w{S) denote the set of variables written (assigned to) by S, the construct 
S[T] is only legal under the condition w{S) fl w{T) — 0. 

^ It is also suggested there that a better approach to scheduling would be probabilistic. 
Indeed a whole line of research on probabilistic noninterference has been developed, 
but this will not be our concern here, where we stick to a possibilistic setting. 
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(Control-OpI) 

(S[T],^)^(5'[r'],p' /r") 

(Control-Op2) 

n{e) = tt, ^ 

(When-Op) 

(when e do S, /i) — >■ (when e do fi') 

Fig. 4. Additional Operational Rules for Systems 



(Control) 



(When) 



r S : {t, a) cmd, F T : {t, a) cmd, t > a 
r h S[T] : (r, a) cmd 
r \- e : 9, r \- S : {t, a) cmd, 9 < t 
r h when e do S (t,9 U a) cmd 



Fig. 5. Additional Typing Rules for Systems 



Notation 4.1 We use (S,n) — >■ to mean 35", /r' such that — >■ (5',/r'), and 

(5, /x) for the negation of (5, /x) — >■. 

The semantics of the new constructs is given in FigureEl where /x'U^/x" represents 
the memory /x with the conjunction of the updates operated by 5 and by T, that 
is ix'\/x U /x"\/x U (/x' n /x"). Then for instance the scheduled programs may be 
written P' = when Si do Pi where Si is the “proceed^’’ signal for program Pi, set 
up by the scheduler. The following program 

Sched ^ = X := 0 ; while tt do i \= [i + l]mod n] k := 0; 

while k < t do Si := tt; Si := ff ; k := k + 1 

describes a scheduler for a system of n threads, implementing round robin with 
time slice t, provided that all the sfs are initially false. It is easy to imagine how 
to program other scheduling policies in a similar style. 

The typing rules for the new operators are given in Figure El The side- 
conditions in rules (Control) and (When) need some comments. First, note 
that a when statement can induce an implicit flow, just like the conditional and 
while statements, as for instance in the system: 



when Xh = 0 do := + 1 
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This explains the requirement 0 < r in rule (When). On the other hand, the 
condition that the guard of the when statement should affect its guard type may 
seem superfluous at first sight, since a when statement can never be followed (in 
sequential composition) by any other system. The reason for this condition is 
that in a controlled system S'[T], a blocked behaviour of the controller S can 
create interferences if the controlled system T is low, and this blocked behaviour 
of S may be due to a when statement. Consider for instance the system S[T] 
where P is a high program that does at least one step: 

S = when xh — 0 do P 

T = vl ■= VL + l 

We let the reader check that this system can lead to interference. Now if the 
when statement S were allowed to have type (L, L) and, the whole system S[T] 
would be typable. 

As regards the rule (Control), the condition t > a excludes for instance - if 
the security levels are L and H - systems S'[T] whose unique type is (L, H) and. 
Consider for instance the controlled system iS’'[r'], where: 

S' = while Xh = 0 do nil 
T' = yL ■■= 0 ; VL := 2/L + 1 

Here again there is a possible interference due to the blocking of the controller 
after one step if the guard of the loop is false. Note that the only possible type 
of 5"[T'] would be indeed {L,H) cmd, since it affects a low variable and has a 
high loop guard. 

To extend our noninterference result to the new setting, we also need to 
restrict the typing rule for conditional branching, recording the tested expression 
as a guard (note the similarity with the rule for the when statement): 

(Cond-Strict) 

r \- e : 0, r \- P : {t, a) cmd, P h Q : {t, a) cmd, 9 < t 



P h if e then P else Q : (r, 0 U cr) cmd 

This rules out for instance the thread a of our initial example 0, because a low 
assignment can no longer be performed after a high test. 

It is easy to check that the Subject Reduction Theorem and the Confine- 
ment Lemma extend to the new language. Similarly, the definitions of (P, lu)- 
bisimulation and w-boundedness remain formally the same as those for the 
base language (modulo the replacement of programs by systems). Obviously, 
the Guard Safety Lemma may now be strengthened into: 

Lemma 4.2 (Strong Guard Safety) 

//P h S : {t, a) cmd then every loop, eonditional or when statement guard in S 
has type 9 in P , with 9 < a. 
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Lemma 4.3 (Deterministic Behaviour of w-Guarded Systems) 

If S is oj-guarded in F and gL=fv, then {S, gf) — >■ (S", /r') implies {S, v) — >■ (S", v'), 
with p! ='f v' . 

We are now able to generalise our noninterference result. 

Theorem 4.4. (Extended noninterference) 

If S is typable in F , then {S, p) {S, v) for any p, v sueh that p=r v- 

Proof: We define inductively the relation as follows: (S', p) TZi’^ (T, v) if 
and only if S and T are typable, p = f v and one of the following holds: 

1. (S, /i) (T, i/), where 7 ?.q’“ is the relation considered in the proof of The- 

orem rrri 

2. {S,p)^‘f{T,,y) 

3. S = T and S is w-bounded 

4. S = So II Si , T = To II and (S„ ^)<’“ (T„ i.) 

5. S — when e do Si, T — when e do Ti, F{e) < uj and (Si, p)TZi'‘^ (Ti, u) 

Then we show that is a {F, o;)-bisimulation. In Clause 3, for the case of the 
control construct, we use the Lemma ^3 LI 

5 Conclusion and Related Work 

We have addressed the question of secure information flow in systems of con- 
current programs. This covers one of the security problems that can arise, for 
instance, when a mobile program visits different sites, namely that of preserving 
the eonfidentiality of the visited sites’ private data. In fact, in |3|, it is shown how 
a form of noninterference called non deducibility on composition may be used 
to model also other security properties like authenticity, non repudiation and 
fairness. Noninterference thus appears as a rather interesting notion to study 
when security is concerned. On the other hand, it may be argued 0 that covert 
channels, that is implicit information flows of the kind considered here, are un- 
avoidable in practice, as they can arise also at the hardware level. Thus the 
aim of statically ensuring the absence of covert channels might be a hard one 
to realise. We certainly do not claim here to cover the whole range of possible 
attacks from a hostile party. 

The issue of noninterference has been largely studied in the literature, using 
different models, and it is not our intention here to review the various approaches. 
We focussed on the approach of Volpano et ah, as it applies to a fairly standard 
language, which can be assumed to be the kernel of more sophisticated practical 
languages. 

The question of secure flow and noninterference has also started to be in- 
vestigated in the setting of process calculi, and in particular in mobile process 
calculi i, iz], mu and m ■ The treatment in the first two papers seems however 
overly restrictive: it amounts (at least in the core calculus) to forbid all con- 
trol flow from actions on high channels to actions on low channels. In 0, the 



394 



G. Boudol and I. Castellani 



core calculus is extended with more sophisticated constructs; in the extended 
calculus some actions may be classified as “innocuous”, and the restriction on 
control flow may be relaxed when these actions are involved. The last two pa- 
pers, m and jn], are less restrictive and closer in spirit to our approach, as they 
try to distinguish the dangerous control flow (implementing information flow) 
from the harmless control flow which should not be restricted. Another related 
paper is P, which studies secrecy properties in security protocols expressed in 
the spz-calculus. 

As concerns noninterference in the presence of scheduling policies, the most 
popular approach has been so far the probabilistic one, taken for instance in m 
and m- Our stand here was to handle scheduling within a possibilistic setting. 

An issue which has not been addressed here, but is planned for future work, 
is the feasibility of checking noninterference using a type inference algorithm, 
in the line of m- Current work is also oriented towards the treatment of more 
realistic languages, as advocated for instance in 0, including exceptions and 
some form of higher-order. 

Acknowledgements. We would like to thank the anonymous referees for help- 
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Abstract. We consider the problem of synthesizing distributed con- 
trollers for reactive systems against local specifications. We show that a 
larger class of architectures become decidable in comparison to the anal- 
ogous problem for global specifications. We identify the exact class of 
architectures for which the problem is decidable. Our results also show 
the decidability of a related realizability problem for local specifications. 



1 Introduction 

An open reactive system is one which interacts with its environment on a sys- 
tematic basis and whose behaviour crucially depends on this interaction. The 
key feature of open systems is that one is required to distinguish between the 
capabilities of the system and its environment. Typically, one views such an 
open system as getting inputs from the environment and reacting to it by out- 
putting values. Realizability and controller synthesis problems arise naturally in 
the study of open systems. 

The realizability problem is to determine, given a specification of an open 
system, say as a temporal logic formula, whether there exists a program for the 
system such that no matter how the environment behaves, the overall behaviour 
satisfies the specification. The program will fix the value the system outputs on 
receiving a particular input and this choice can depend on the past history of 
the interaction with the environment. The environment is allowed to input any 
value at any point. 

The controller synthesis problem on the other hand, starts with an open sys- 
tem — often called a plant in this context — and a specification, say, a temporal 
logic formula. The plant is viewed as an existing program which specifies the 
ways in which the system can react to its inputs. The goal now is to come with 
a strategy to interact with the environment in a way allowed by the plant, such 
that all behaviours satisfy the specification. Thus the strategy, in this setting, 
acts as a controller for the plant which restricts the behaviours of the plant so 
that the specification is met. 

There is a wealth of literature on realizability and controller synthesis prob- 
lems for open systems as evidenced in |BL65IRabV^rrbo5^iRW§aPR§5IAL m 
for linear-time specifications and |KV9filMT!I8IK VflflIKMTVflflj for branching- 
time specifications. All of these studies are confined to programs and plants that 
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consist of a single sequential module. However, these problems often arise in 
a distributed context and here, the main results presently available are due to 
Pnueli and Rosner ip™ . We extend here their results and the point of depar- 
ture is to consider local specifications. Before going into this in more detail, we 
wish to mention the work reported in ldAHM()0ldAH0(iAHkh7l which are also 
concerned with control-related issues in a distributed setting but not directly 
connected to the concerns of the present work. 

Pnueli and Rosner consider distributed programs by using the notion of an 
architecture which basically consists of a set of sites and some communication 
channels between them. The sites also have external input and output channels 
through which they interact with the environment. The specification consists of 
linear time temporal logic formulas whose atomic propositions can state proper- 
ties of the values on the external input and output channels. The specification 
is hence a global one which can talk about simultaneous channel- values at differ- 
ent sites. The surprising main result is that even in the case of an architecture 
consisting of two sites and no internal communication channels, the realizability 
problem is undecidable. They also consider pipeline architectures which consists 
of a linear array of sites sq —t si s„ with an external input channel 

allowed only for sq. They show that the realizability problem is decidable for 
this class of architectures. 

It turns out that many results of [PH.DOj extend also to the control-synthesis 
problem that we wish to study. Here, we are given in addition to the specification, 
a set of programs at each site and the problem is to come up with local strategies 
for the sites to restrict the programs; by a local strategy at a site, we mean 
a strategy which only knows the values which the input channels to this site 
have carried so far. In FMi| . realizability is actually shown to be decidable 
for a larger class of architectures (called hierarchical architectures) but only the 
decidability results for pipelines carry over for the control-synthesis problem. 

In this paper, we drop global specifications since they turn out to be unrea- 
sonably expressive, and instead consider local specifications and, for convenience, 
study only the controller synthesis problem. We identify a special class of ar- 
chitectures called clean pipelines (see Figure Pi which are just like the pipelines 
mentioned above except external inputs are also allowed for the right-end site s„. 
Our main result is that the controller synthesis problem for local specifications 
is decidable for an architecture A iff each connected component of ^ is a clean 
pipeline or is a sub-architecture of a clean pipeline. 

Our undecidability results go through for weaker acceptance conditions down 
to reachability. Thus our negative results show that even in the presence of 
local specifications, the controller synthesis problem is intractable for almost 
all architectures. However, on the positive side, our results show that for local 
specifications, one can handle the nontrivial distributed reactive system which 
consists of two sites, both of them interacting with the environment, and with 
an internal channel from one to the other. Indeed, our results show that the 
realizability problem can also be effectively solved in this important setting. 
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where the specifications at the sites can state properties of the internal channels 
as well. 

The undecidability result of rPTMl follows from the study of multi-player 
games of incomplete information by Peterson and Reif mmi . In this context, 
our results show that there are certain games where two players, playing against 
an adversary, have incomplete information about each other, and yet determining 
whether they have a winning strategy is decidable. 

In the next section we introduce the formal setting for our work, Section 3 
establishes our decidability results and in Section 4 we prove the undecidability 
results. Due to lack of space, we provide only the main proof ideas — more 
details can be found in [IMTni| . 

2 Problem Setting 



An architecture is a tuple A = {S, A, T, r, w) where S = {si, . . . , s^} is a finite 
nonempty set of sites, X = {x\, . . . ,xi\ is a set of external (or environment) 
input channels and T = {ti, . . . , is a set of internal channels, r is a function 
r : A U T — >■ S' which identifies for each channel a process which reads the 
channel; w : T ^ S identifies for each internal channel, a process which writes 
into it. We assume, without loss of generality, that each process has at most one 
external input channel and that there is at most one channel from one site to 
another. 

We represent architectures graphically as directed graphs whose nodes are 
the sites and every channel z £ A U T is represented by an edge — if z £ T, then 
it is an edge from w{z) to r(z) and if z £ A, then it is a sourceless edge to r(z). 
We only consider acyclic architectures — i.e. those architectures whose graph 
representation does not have a directed cycle. We assume further that every site 
has at least one input (external or internal) channel. 

In our framework, each site runs a program which reads its external and 
internal channel inputs and reacts by sending outputs along the internal channels 
to other processes and changing its state. The moves are synchronous — i.e. 
the programs read a set of external inputs and make one collective move while 
respecting the partial order imposed by the architecture. 

For example, in the architecture A2 in Figure El in a synchronous step, si 
will read the environment’s input on x\, change its state and write onto ti; S2 
will read this input on ti and the input on X2, change its state and write onto 
t2\ S3 will read this value and change its internal state. 

For a site s G S, let in{s) = r~^{s), the set of channels which s reads from 
and out{s) = ic“^(s), the set of channels S writes into. Given an architecture A, 
a domain definition for A is a function D which associates with each z £ A U T 
a finite set of values which can be sent along the channel z. We denote D{z) 
sometimes as D^. For a set of channels Z, a valuation function for Z is a, function 
h whose domain is Z and which maps each z £ Z to an element of D^. Let T-Lz 
denote the set of all valuation functions for Z . 
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Definition 1. A plant is a tuple {A, D, P) where A is an architecture (say hav- 
ing k sites {si, . . . , Sfe } ), D is a domain definition for A, P is a set 0 / programs, 
one at each site — i.e. P is a tuple (Pi, . . .Pfe). Each Pi is a transition system 
where Qi is a set of states, G Qi is the initial state, Si is a non- 
deterministic transition function Si : Qi x 'Hin(si) 'P{Qi x ^out(si ))0 n 

The transition function of each program defines the different ways in which a 
site can react to a set of inputs on its input channels, by giving the possible 
sets of values which can be written on the output channels together with a 
corresponding state change. We say a plant is finite if Qi is finite for each Pi. 

Let {A,D,P) be a plant. For a program P = {Q,q^^,S) at a site s in a 
local strategy for P is a function f : Q x — >■ Q x TLout(s) such that Vg G Q, 

7 T G if 7 T = 7 t' • h then f{q,n) G S{q,h). Thus the local strategy / is an 

advice function for P which looks at the history of values (tt') on the local input 
channels and the current values on them (h), and prescribes a move which the 
local program P should take. 

A distributed control- strategy is a set of local strategies, one for each site: 

i.e. a tuple / = (/i, ■ ■ ■ fk) where fi is a local strategy for Pi. We call a plant 
along-with a strategy {{A,D,P),f) a controlled system. Let us fix for now a 
controlled system {{A,D,P),f). 

We need some notations for talking about sequences. For a sequence a, let 
a\i] denote the atom in a and a[i,j] denote the finite subsequence of a from 
the to the element, both inclusive, for 0 < z < j, z, j G N. Also, we denote 
by inf (a) the set of elements which occur infinitely often in a. If a is a sequence 
of functions on a domain Z, let a 4^ Z' , where Z' C Z , denote the sequence of 
functions obtained by restricting each function in a to Z' . 

Consider an environment input sequence (on the channels in X) a G Pjc. 
Since P, when controlled by the strategy /, is deterministic, there is a unique 
way in which the plant and controller respond to the external input — i.e. there 
is a unique sequence of states each program takes and a unique set of channel 
values sent along each channel. We can define these sequences as follows. Let 
l3 G ('Hxut)“ and 7 G (Qi x . . . be such that: 

1. 7[0] = (gr,---,C) 

2. PiX = a 

3. V< € T, j € N, if w{t) = Si and 7 [j] = (gi, . . . qk) with 
f^{qi,^3[0J] i m(s*)) = (g',/z), then /3[j](t) = /z(t) 

4. Vj G N, if 7 [j] = (gi , . . .qk) and 7 [j+l] = {q'l, ■ . ■ q'Q, then it must be the case 
that Vz G {!,..., fc}: /*(gi, /3[0, j] i zrz(si)) = (g',h) for some h G 'Hout{si) 

The definitions above formalize how the programs behave when they get an 
external input. ( 1 ) says that the global state-sequence starts with the initial 
states. The next condition requires that the values which the external channels 
take are defined by the external input a. (3) and (4) demand that the values 
the internal channels take and the evolution of states are according to the moves 
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defined by the local strategy. It is easy to see that, since the architecture is 
acyclic, there are unique sequences j 3 and 7 which satisfy the above conditions. 
We call 7 the state-behaviour of the system for the input a. 

The specification is now defined on the local state-behaviours of the system. 
Since we wish to capture local linear-time properties, we have local Rabin win- 
ning conditions and the specification then demands that the local runs of the 
controlled system meet these conditions. We could instead work with temporal 
logic specifications, one at each site, which describes the behaviour of the local 
channels (external and internal) and the local states. However, since we can cast 
this problem in terms of Rabin conditions (by building a deterministic Rabin 
automaton accepting the desired behaviours |Saf88j and taking its intersection 
with the local plant), we can reduce this problem to our setting, coded into the 
states of the plant. 

A local Rabin winning condition TZi for a site Si is a set 
{(i?i, Gi), . . . (i?m, Gm)} where Rj,Gj are subsets of Qi. A Rabin win- 
ning condition W is a tuple {TZi, ■ ■ ■ TZk) where each TZi is a local Rabin winning 
condition for Si. Let 7 S {Qi x ...Qk)'^ be a sequence of global states of 
the system. Let 7 | f denote the sequence in obtained by projecting 7 to 
the component involving Qi. 7 is said to satisfy a Rabin condition W if for 
each site Si, there is a pair (R,G) in TZi such that inf{'j f i) tl R = 0 and 
m/(7 4, i) n G 7^ 0. 

Finally, a controlled system is said to satisfy a Rabin winning condition W if 
for every sequence of external inputs a € (TTx)‘^ , the state-behaviour 7 defined 
by a satisfies W. Given a winning condition, a strategy / for a plant (A,D,P) 
is said to be winning if the controlled system {{A, D, P), /) satisfies the winning 
condition. Now, the control-synthesis problem for an architecture A is: Given a 
finite plant {A, D, P), and a Rabin winning condition, does there exist a winning 
strategy for the plant? 

An important class of architectures are the class of clean pipelines which are 
pipelines that have external inputs only at the two endpoints (see Fig. ^): An 
architecture A is said to be a clean pipeline if it has k sites si, . . . Sfc (for some 
fc G N), two external input channels x\ and X2, k —1 internal channels ti, . . . tk-i, 
with r(a:i) = si, r{x2) = Sk, w{ti) = s* and r{ti) = Sj+i, for i G {1 , . . . , fc — 1 }. 




Fig. 1. A generic clean pipeline 



We also need the notion of a sub-architecture — an architecture Al is a sub- 
architecture of an architecture A if the graph of Af is isomorphic to a subgraph 
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of the graph of A. Note that an architecture is a sub-architecture of itself. We 
can now state the main result of the paper. 

Theorem 1. Let A be an architecture. The control-synthesis problem for A is 
decidable iff each connected component of A is a sub -architecture of a clean 
pipeline. □ 



3 Decidable Architectures 

In this section, we show that architectures where each connected component is 
a sub-architecture of a clean pipeline is decidable. Since we have local winning 
conditions, it is easy to observe that the control-synthesis problem for an archi- 
tecture A is decidable iff the control-synthesis problem is decidable for each of its 
connected components. Hence it suffices to prove that the problem is decidable 
for architectures which are sub-architectures of clean pipelines. 

We use alternating and non-deterministic tree automata to prove our decid- 
ability results. Due to lack of space, we assume a standard presentation of these 
automata |lho90IMS95j . 

A tree is a directed acyclic graph T = (F, E) which has a designated root r 
which does not have a parent. Every other node is reachable from r and has a 
unique parent. We say v' is a child of v if {v,v') G E. For a set E, a A-labelled 
tree is a pair (T, r) where T = (F, E) is a tree and t : V ^ E. Let T be a finite 
set. Then T* can be viewed as a tree Tr = {T* , E) where {x, x-d) G E, for every 
X in T* and d in T. We refer to this as the T-tree. 

Consider a plant (A, D, P) and a distributed control-strategy / for it. Let s 
be a site with an output channel t in A. Let £ C Df be the language of infinite 
strings output on t (by considering all possible inputs on the external input 
channels of the plant). We call such a language of infinite words, a communication 
language for the channel t. Let Pref(C) = {x \ 3(3 G C,x is a prefix of /?}. 
Then it is not difficult to see that £ = lim{Pref {£)) where lim(L) = {a G E'^ \ 
for every prefix x of a,x G £}. Also, £ yf 0. 

So L = Pref{C) C D(, the set of finite sequences sent along t represents the 
set of infinite sequences sent along the channel as well. So £ can be represented 
(uniquely) by a {T, _L}-labelled £>t-tree T = where t{x) = T if x G L 

and t(x) = _L, otherwise. In such a tree if a node has label T then it will have 
at least one child with the label T and if a node has label _L then all its children 
will have the label _L. Also, the root, e is labelled T. Clearly each such {T, _L}- 
labelled D^-tree (which we call t-type trees) uniquely represents a communication 
language of the channel t and our automata run over such trees. If T is a t-type 
tree then we let LangiT) denote the language of infinite strings it represents. 

Let us fix a clean pipeline which has k sites, as shown in Figure H We refer 
to Si as the left-site, Sk as the right-site and each of the s£s, 1 < z < /c as middle 
sites. 

Let s be the left-site of a clean pipeline with input channel x and output 
channel t. Suppose P is the program at s, and there is is a local winning strategy 
f at s and £ is the language of infinite words sent along t as a result of the pair 
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(P, /) reacting to all possible inputs on x. Then £ is said to be an s-successful 
language. For a right-site s, with input channels t and x, we say that a language 
C of infinite strings over is s-successful, if there is a strategy / at s which can 
work on these inputs, and arbitrary inputs on x, and win locally. For a middle- 
site s with input channel t and output channel t' , we say that C C is 

successfully generable by s on £ C if there is a strategy at s which wins 

locally on reading inputs from £ on t and produces £' on t'. 

Lemma 1. Let s be the left-site of a clean pipeline with input channel x, output 
channel t and program P. Then there is an alternating tree automaton (on t- 
type trees) which accepts a t-type tree T iff the language represented by T has an 
s-successful sublanguage. 

Proof: The automaton we construct, while running over a t-type tree T, 

guesses a local strategy / for the program at s, makes sure that / produces no 
string which is not in Lang{T) and also checks that / is locally winning. 

The automaton has, in its state-space, a component encoding which state of 
the program it is currently simulating. Then reading a node y of T, it does the 
following: 

— Guess a set of moves from the current state for each possible input d in D{x). 

— The automaton propagates, for each d € L>(x), a copy along the direction 
d' £ D{t) where d' is the output of the plant on d according to the guessed 
move. The corresponding successor state of the program is also propagated 
and the automaton will check in these copies whether the labels of the nodes 
it reads are T. This will ensure that the outputs are allowed by T. 

The acceptance condition ensures that paths on a run encode state-sequences 
which satisfy the local winning condition of P. Since each node in a run corre- 
sponds to a unique input history, the independent guessing of the strategy at 
these nodes is justified. □ 

Note that the automaton accepts a tree provided the language represented 
by the tree merely contains an s-successful language. It seems hard to strengthen 
this containment to equality. However, the present version will suffice. 

Lemma 2. Let s be a right-site of a pipeline with in{s) = {x,t} and let the 
program at s be P. Then there is an alternating tree automaton on t-type trees 
which accepts a tree T iff the language that T represents is s-successful. 

Proof: The automaton guesses a local strategy for P at s on input sequences 

a G Lang{T) along t and arbitrary input sequences j3 £ D{x)^ on x and makes 
sure that / is winning for all local runs on these sequences. 

The automaton keeps track in its state-space the current state of P it is 
simulating. Reading a node y of the input tree, it does the following: 

— Guess Y C D{t) corresponding to the set of successors of y labelled T. The 
automaton will (in its next move) check if Y is indeed the set of T -successors. 

— The strategy has to handle all inputs in Y on the channel t along with 
an arbitrary input in D(x) on channel x. The automaton guesses such a 
strategy at this point by guessing moves from the current state of P on each 
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h G with h{t) G Y. It then propagates along each direction d in F, 

a copy of the automaton for each d' G D{x) corresponding to the chosen 
move when channel t carries d and channel x carries d' . It propagates the 
corresponding state of P as well. 

The acceptance condition is that all paths on a run encode a state-sequence 
in P which satisfies the local winning condition of P. Again, since each node in a 
run corresponds to a unique input history, the guessing of the strategy at these 
points independently is justified. □ 

Theorem 2. The two-site clean pipeline is decidable. 

Proof: Let the sites and channels of the pipeline be labelled as in Figure E 

Using Lemma^ construct an automaton Ai which accepts a ti-type tree T iff si 
can successfully generate a sublanguage of Lang{T). Using Lemma El construct 
A2 which accepts ti-type trees which represent languages which S2 can win on. 
The claim now is that a distributed winning strategy exists iff L{Ai) fl L{A2) is 
nonempty. 

Assume T G L{Ai) fl L{A2) and let £ be the language it represents. Then 
there is a strategy /2 at S2 which wins on £. Also, there is a local winning strategy 
fl at Si which generates a sublanguage £' of £. However, since the local winning 
conditions are linear-time specifications, /2 wins on £' as well. Hence (/i,/2) is 
a distributed winning strategy. Furthermore, one can construct, from the runs of 
Ai and A2 on a regular tree in L{Ai) fl £(^2), a strategy which can be realized 
as a finite-state transition system. Also, it is easy to see that if (/i,/2) is any 
winning distributed strategy, then the tree corresponding to the language fi 
generates is accepted by Ai as well as A2. □ 

Lemma 3. Let s be a middle-site of a clean pipeline with in{s) = {t} and 
out{s) = {£}, and let the program at s be P. Let A be a nondeterministic 
automaton accepting t-type trees. Then there is an automaton on {T, l.}-labelled 
t'-type trees that accepts a tree T' iff there is a t-type tree T accepted by A and a 
language £q C Lang(T') such that £q is successfully generable by s on Lang{T). 

Proof: Let T' be an input to the automaton and C be the language it 

represents. The automaton, while reading T' , guesses a t-type tree T, guesses a 
run of A on T, guesses a strategy / on the strings represented in T and makes 
sure that the run on T is accepting, makes sure that the strategy outputs strings 
which are included in £' and makes sure that the strategy locally wins. 

A node in the run on T' corresponds to a node y' in T' as well as a node 
X of the tree T being guessed — here x is the sequence in D{t)* on which the 
guessed strategy has output y' . Note that each sequence in D{t)* can lead to at 
most one sequence in D{t')* being output and hence guessing of the tree T at 
nodes of the run is JustifiedH 

The state-space of the automaton has both the current state of P as well as 
a state of the automaton A which represents the state-label of the correspond- 
ing node in T in the guessed run on T. The automaton at a node in the run 
corresponding to the node y' in T' and a; in T does the following: 

If the site also has an external input, this will not be the case. 



2 
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— Guess the set Y' C D(t') which corresponds to the children of y' labelled T. 

— Guess the labels of the children of x in T. This is the point where T is being 
guessed. Let X C D{t) be the children of x labelled T. 

— The automaton now guesses a move of P from the current state on each 
d G X and makes sure that the output on t is in F'. It then propagates along 
each direction d' G Y' in T, many copies of itself — each corresponding to a 
d G D{t) on which the guessed move outputs d'. The appropriate successor 
state of P is propagated. The automaton also guesses a transition of A from 
the node x and propagates these automaton states as well. 

The acceptance condition makes sure that along any path in the run, the state- 
sequence of P meets the local winning condition of P and the state-sequence of 
the automaton meets the winning condition of A. □ 

Theorem 3. The control-synthesis problem for clean pipelines is decidable. 

Proof: We start with the left-site of the pipeline, use Lemma[I]and walk down 
the pipeline by successively using Lemma 0 After each site, we have to convert 
the alternating automata we get to a nondeterministic one |MS9Rj so as to apply 
Lemma 01 In the end, we intersect the automata we have got with that obtained 
using Lemma 13 Then by an argument similar to the one in Theorem |3 we can 
show that there is a nonempty intersection iff there is a distributed controller 
and if a controller exists, we can synthesize one which is finite-state. □ 

The results imply the decidability of a related realizability problem: given a 
clean pipeline and localised temporal logic specifications at each site, where each 
specification can express properties of the local channels at that site, is there a 
program at each site such that the combined behaviour meets the specification? 
This problem can be reduced to the control-synthesis problem by choosing a 
trivial plant at each site which permits all possible ways of writing onto the 
local output channels. 

4 Undecidable Architectures 

We show now that any architecture which is not a sub-architecture of a clean 
pipeline is undecidable. We show first the undecidability of the three basic ar- 
chitectures in Figure 2. The reductions are from the halting problem for deter- 
ministic Turing machines starting with blank tapes. Our proofs are extensions 
of the the undecidability proof developed in 

A configuration of a deterministic Turing machine M is a sequence C G 
P* ■ Q ■ T+ where P is the set of tape symbols and Q is the set of states. If 
C = xqy, with q G Q, then the machine has x ■ y written on the tape with 
the head position on the cell after x. The initial configuration. Cm = qin ■ b 
where qm is the initial state and b is the special tape symbol called blank. The 
transition relation h on configurations is defined in the obvious way. We say that 
the machine halts on the blank-tape if Cm F* Ch where the state in Ch is a 
designated halt state. 
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Fig. 2. Basic undecidable architectnres 



The sites host programs which have associated strategies to output configu- 
rations. The (finite state) program outputs words in F* - Q- F^ with the strategy 
deciding which configurations are output. The input channels carry two symbols 
S (Start outputting a new configuration) and iV(output the next symbol of the 
current configuration). On receiving S, a program will output $, followed by a 
configuration while reading N and finish the configuration by outputting $. It 
waits while reading N, and outputs a special symbol * during this time, till it get 
an S on which it starts outputting another configuration. The first configuration 
output by the program is always Cm- 

Lemma 4 . The control-synthesis problem for the architecture Ai is undecidable. 

Proof: The main idea of the proof is to make si and S2 always send their current 
states to S3. Site S3 now has the global view of si and S2 and a global specification 
of Si and S2 (exploited in mm to get the undecidability argument) can be 
stated as a local specification for S3. □ 

Lemma 5 . The control-synthesis problem for the architecture A2 is undecidable. 

Proof: Site Si will output configurations when prompted by the environment 

through channel x\. Site S3 will, when prompted by S2 on t2, “accept” configura- 
tions instead of outputting them; when it starts a configuration, it will generate 
it one unit time in advance and keep the generated symbol of T U Q in its state- 
space. It proceeds from this state only if the input it receives on t2 is the same 
as the symbol it has committed to. It then proceeds to commit the next symbol. 
This can be looked upon as S3 generating configurations which S2 has to predict 
correctly. 

Site S2 can go into two modes, A and B, the decision being taken according 
to the first environment input on X2. In Mode A, the program at S2 simply 
passes the configurations which it receives on t\ to ^2- In Mode B, the program 
first outputs the initial configuration to S3 and after that, each time it receives 
a configuration C on t, it propagates online C to ^2 where C \~ C . 

If S3 receives a symbol it has not committed to, it goes to a reject state. Mode 
A ensures that the two sites output/accept the same configuration sequences 
while Mode B ensures that if the i*^ configuration output by si is C and the 
{i-\- 1)*^ configuration accepted by S2 is C", then C \~ C . So the only way the 
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plant can hope to win is by si and S3 accepting the configuration sequence of 
M . By introducing a winning condition on S2 which makes sure that S2 locally 
wins only if it outputs the halting configuration, one can show that the plant 
has a distributed winning strategy iff M halts on the blank tape. □ 

Lemma 6. The control-synthesis problem for the architecture A3 is undecidable. 

Proof: As done by S3 of A2 in the previous lemma, si and S2 will now accept 

configuration of M. Site s can be in two modes, A and B, the mode chosen by the 
first input on x. In Mode A, the program at si passes the initial configuration Cm 
to Si and makes S2 wait. Then, while getting as input an arbitrary configuration 
C from the environment on x, it passes C to S2 and simultaneously passes C to 
Si where C \~ C . Mode B is analogous with the roles of si and S2 interchanged. 

To force si and S2 to accept the correct configuration sequence of M, we 
would like the environment to win iff it can get the site scheduled first to be 
unstuck and get the other stuck. The trick is to have another mode C for s 
where the plant is forced to emulate the combined (product) behaviour of si 
and S2. The winning condition can now be stated on the state-space of s. Then 
one can make make sure that one of the sites, say si, wins when it accepts the 
halting configuration. One can show now that a distributed controller exists iff 
M halts on the blank tape. □ 

Using Lemma 0 we can show that any architecture which has a site s with 
two internal input channels is undecidable. The idea is to pick the minimal sites 
above the two internal channels for s and make the rest of the sites “dummy” by 
making them just pass their input to their output and always win locally. We can 
then reduce the control-synthesis problem of A\ to that over this architecture. 
Similarly, using Lemma El we can show that any architecture which has a site 
with two internal output channels is undecidable. 

What we are left with are pipelines. Since we require each process to have an 
input channel, the left-site of the pipeline must have an external input channel. 
Consider a pipeline (which is not a clean pipeline) with sites {s(, . . . , s).}, fc > 2 , 
with s' having an external input channel where 1 < i < k. We can reduce 
the control-synthesis problem for A2 to the control-synthesis problem for this 
pipeline, by coding the program at si into s'3, the program at S2 into s' and 
the program at S3 into s'f. . The remaining sites of the pipeline will be “dummy” . 
Hence we have: 

Theorem 4. If A is an architecture which has a connected component which is 
not a sub-architecture of a clean pipeline, then the control-synthesis problem for 
A is undecidable. □ 

The results above can be suitably changed to show that even for weaker winning 
conditions such as Biichi, co-Biichi, or even safety conditions, the architectures 
remain undecidable. 

Acknowledgement. We would like to thank Wolfgang Thomas and Christof 
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1 Introduction 



The Ambient calculus ^ is a model for mobile distributed computing. An am- 
bient is the unit of movement. Processes within the same ambient may exchange 
messages; ambients may be nested, so to form a hierarchical structure. The 
three primitives for movement allow: an ambient to enter another ambient, 
n[inm. P | Q] \ m[R] — > m[n[P \ Q] \ R]; an ambient to exit another ambi- 
ent, m[ n[ out m.P \Q] | P] — n[P \ Q] \ m[R]\ a process to dissolve an am- 
bient boundary thus obtaining access to its content, openn. P | n[Q] — >• P | Q. 

Several studies of the basic theory of the Ambient calculus have recently 
appeared, concerning for instance behavioural equivalences, types, logics, static 
analysis techniques phlhlllYir^ . In comparison, little attention has been given 
to implementations. The only implementations of Ambients we are aware of are 
Cardelli’s . and Fournet, Levy and Schmitt’s Pj. The latter, formalised as a 
translation of Ambients into the distributed Join Calculus, is the only distributed 
implementation. Although ingenious, the algorithms that these implementations 
use for simulating the ambient reductions are fairly complex. 

One of the difficulties of a distributed implementation of an ambient-like 
language is that each movement operation involves ambients on different hier- 
archical levels. For instance, the ambients affected by an out operation are the 
moving ambient, and its initial and its final parent; at the beginning they re- 
side on three different levels. In locks are used to achieve a synchronisation 
among all ambients affected by a movement. In a distributed setting, however, 
this lock-based policy can be expensive. For instance, the serialisations intro- 
duced diminish the parallelism of the whole system. In 0 the synchronisations 
are simulated by means of protocols of asynchronous messages. The problems 
of implementation have been a restraint to the development of programming 
languages based on Ambients and to experimentation of Ambients on concrete 
examples. In our opinion, implementation is one of the aspects of Ambients that 
most need investigations. 

In this paper we study an abstract machine for a distributed implementation 
of an ambient-like calculus. The algorithms of our abstract machine are quite 
different from, and simpler than, those of EEEI, mainly for two reasons. The 
first - the most important — is that the calculus that we actually take is typed 
Safe Ambients HH (SA) rather than untyped Ambients. SA is a variant of the 
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original calculus that eliminates certain forms of interference in ambients, the 
grave interferences. They are produced when an ambient tries to perform two 
different movement operations at the same time, as for instance n[±nh.P \ 
out n.Q I R]. The control of mobility is obtained in SA by a modification of the 
syntax and a type system. In El the absence of grave interferences is used to 
develop an algebraic theory and prove the correctness of some examples. One of 
the contributions of this paper is to show that the absence of grave interferences 
also brings benefits in implementations. 

The second reason for the differences in our abstract machine is the separation 
between the logical structure of an ambient system and its physical distribution. 
Exploiting this, the interpretation of the movement associated to the capabilities 
is reversed: the movement of the open capability is physical, that is, the location 
of some processes changes, whereas that of in and out is only logical, that 
is, some hierarchical dependencies among ambients may change, but not their 
physical location. Intuitively, in and out are acquisition of access rights, and 
open is exercise of them. 

The differences also show up in the correctness proof of the abstract machine, 
which is much simpler than the correctness proof of the Join implementation. 

Of course another difference is that our algorithms are formulated as an 
abstract machine. The machine is independent of a specific implementation lan- 
guage, and can thus be used as a basis for implementations on different languages. 
In the paper we sketch one such implementation, written in Java. 



2 Safe Ambients: Syntax and Semantics 

We briefly describe typed Safe Ambient (SA), from In the reduction rules 
of the original Ambient calculus, mentioned in Section [Q an ambient may enter, 
exit, or open another ambient. The second ambient undergoes the action; it 
has no control on when the action takes place. In SA this is rectified: coactions 
inn, out n, openn are introduced with which any movement takes place only if 
both participants agree. The syntax of SA is the following, where n, m, . . . are 
names, x,y, . . . are variables, X,Y, . . . are recursion variables: 

M,N := a: I n I in M I in M I out M | out M | openM | open M 
P,Q,R:=0 I P\Q I {vn)P I M.P I M[P] \ (M) \ {x)P | A | recX.P 

Expressions that are not variables or names are the capabilities. We often 
omit the trailing 0 in processes M. 0. Parallel composition has the least syntactic 
precedence, thus m[M.P \ Q] reads m[{M.P) \ Q]. An ambient, or a parallel 
composition, or variable, is unguarded if it is not underneath a capability or an 
abstraction. In a recursion rec A. P, the recursion variable X should be guarded 
in P. For simplicity of presentation we omit path expressions in the syntax. 

Below are the reduction axioms: those for movement, and the communica- 
tion rule (communication is asynchronous, takes place inside ambients, and is 
anonymous — it does not use channel or process names): 
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n[ inm. Pi | P2 ] 


TO[inm.Qi \ Q2] — 


->• m[n[Pi P2] 


1 Qi 


1 Q2] 


[R-in] 


m[n[ out m. Pi | P2 


] outm.Qi Q2I — 


->• n[Pi P2 ] m 


[Qi 


1 Q2] 


[R-out] 


openn. P | 


n[openn. Qi Q2 ] — 


P \ Qi \ Q2 






[R-open] 




(M) 1 (x)P - 


-)■ P{^/a;} 






[R-msg] 



The inference rules allow a reduction to occur underneath a restriction, a 
parallel composition, and inside an ambient. Moreover, the structural congruence 
relation (=) can be applied before a reduction step. Structural congruence is used 
to bring the participants of a potential interaction into contiguous positions; its 
definition is standard, and includes rules for commuting the positions of parallel 
components, for stretching the scope of a restriction, for unfolding recursions. 
We write for the reflexive and transitive closure of — >■. The use of coactions, 
in the syntax and operational rules, is the only difference between (untyped) SA 
and and the original Ambient calculus. 

Up to structural congruence, every ambient in a term can be rewritten into 
a normal form 

n[Pi I ... \ Ps \mi[Qi] I ... \mr[Qr]] 

where Pi (i = 1. . s) does not contain unguarded ambients or unguarded parallel 
compositions. In this case, P\, . . . , Ps are the local processes of the ambient, and 
mi[Qi] I ... I rrir [ Qr ] are the subambients. 

SA has two main kinds of types: single-threaded and immobile. We consider 
them separately. We begin with the single-threaded types, which we informally 
describe below. We consider immobility types in Section El 

The capabilities of the local processes of an ambient control the activities of 
that ambient. In an untyped (or immobile) ambient such control is distributed 
over the local processes: any of them may exercise a capability. In a single- 
threaded (ST) ambient, by contrast, at any moment at most one process has 
the control thread, and may therefore use a capability. An ST ambient n is 
willing to engage in at most one interaction at a time with external or in- 
ternal ambients. Inside n, however, several activities may take place concur- 
rently: for instance, a subambient may reduce, or two subambients may interact 
with each other. Thus, if an ambient n is ST, the following situation, where 
at least two local processes are ready to execute a capability, cannot occur: 
n[ ±nm. P \ out h.Q \ R]. The control thread may move between processes local 
to an ST ambient by means of an open action. Consider, for instance, a reduction 
n[openm. P | m[openTO. Q] ] — > n[P \ Q] where n and m are ST ambients. 
Initially open to. P has the control thread over n, and open to. Q over to. At 
the end, to has disappeared; the control thread over n may or may not have 
moved from P to Q, depending on the type of to. If the movement occurs, Q can 
immediately exercise a capability, whereas P cannot; to use further capabilities 
within 71, P will have to get the thread back. 

For simplicity, we assume here a strong notion of ST, whereby a value message 
(M) never carries the thread. In El a weaker notion is used, where also messages 
may carry the thread. In the remainder, all processes are assumed to be well- 
typed, and closed (i.e., without free variables). 
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3 The Abstract Machine, Informally 

We describe the data structures and the algorithms of the abstract machine, 
called PAN. PAN separates between the logical and the physical distribution 
of the ambients. The logical distribution is given by the tree structure of the 
ambient syntax. The physical distribution is given by the association of a location 
to each ambient. 

In PAN, an ambient named n is represented as a located ambient h^.n[P]k^ 
where h is the location, or site, at which the ambient runs, k is the location 
of the parent of the ambient, and P collects the processes local to the ambient. 
While the same name may be assigned to several ambients, a location univocally 
identifies an ambient; it can be thought of as its physical address. 

A tree of ambients is rendered, in PAN, by the parallel composition of the 
(unguarded) ambients in the tree. In this sense, the physical and the logical 
topology are separated: the space of physical locations is flat, and each location 
hosts at most one ambient, but each ambient knows the location at which its 
parent resides. For instance, an SA term n[Pi \ P2 \ mi[Qi] \ m2[Q2]]j where 
Pi and P2 are the local processes of n, and Qi {i = 1 , 2 ) is a local process of mi 
(i.e., mi has no subambients), becomes in PAN: 

h:n[Pi I P2]root II ki:mi[Qi]h || k2-.m2[Q2]h 

where h,ki,k2 are different location names, root is a special name indicating 
the outermost location, and || is parallel composition of located ambients. (The 
above configuration is actually obtained after two creation steps, in which the 
root ambient spawns off the two ambients located at k\ and ^2-) Since ambients 
may run at different physical sites, they communicate with each other by means 
of asynchronous messages. 
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All the actions (in , out , and open ) can modify the logical distribution. Only 
open , however, can modify the physical distribution. The algorithms that PAN 
adopts to model reduction in SA are based on 3 steps: first, a request message is 
sent upward, from a child ambient that wants to move (logically or physically) 
to its parent; second, a mateh is detected by the parent itself; third, a eompletion 
message is sent back to the child, for its relocation. The only exception is the 
algorithm for open , where a further message is needed to migrate the child’s 
local processes to the parent. These steps are sketched in Figure 1, where a, 6, c 
represent three ambients, a straight line represents a pointer from an ambient 
to its parent, and a curved line represents the sending of a message. Thus in 
first row of Figure 1, at the beginning a and b are sibling ambients and c is their 
parent. This figure illustrates an R-IN reduction in which a becomes a child of 
b. In the first phase, a demands to enter b (precisely, if n is the name of 6, then 
a demands of entering an ambient with name n), and b accepts an ambient in. 
For this, a and b send requests in and in to their parent c (the actual messages 
may also contain the name and location of the sender; these are not shown in the 
figures). In the second phase, c sees that two matching requests have been sent 
and authorises the movement. Finally, in the third phase, c sends completion 
messages to a and b. The message sent to a also contains the location of b, which 
a will use to update its parent field. An ambient that has sent a request to 
its parent but has not yet received an acknowledgement back, goes into a wait 
state, in which it will not send further requests. In the figures, this situation is 
represented by a circle that encloses the ambient. An ambient in a wait state, 
however, can still receive and answer requests from its children and can perform 
local communications. 

The second row of Figure 1 sketches an R-OUT reduction. In the first phase, 
ambient a demands its parent b to exit. When b authorises the movement (phase 
2), it sends a an acknowledgement containing the location of the parent of b, 
namely c, and upon receiving this message (phase 3) a updates its parent field. 
The grandparent ambient c is not affected by the dialog between a and b. The 
third row of Figure 1 sketches an R-OPEN reduction. Ambient a accepts to be 
opened, and thus notifies its parent c. If a matching capability exists, that is, one 
of the processes local to c demands to open a, then c authorises a to migrate its 
local processes into c. Ambient a then becomes a forwarder (a > c in the figure) 
whose job is just to forward any messages sent to a on to c. Such a forwarder 
is necessary, in general, because a may have subambients, which would run at 
different locations and which would send their requests of movement to a. 

Using R-OPEN, rather than R-IN or R-OUT, for the physical movements may 
appear counterintuitive. One should however bear in mind that, in an ambient- 
like formalism, entering and exiting ambients is not very useful without opening 
some ambients. 

4 The Abstract Machine, Formally 

Syntax. The syntax of PAN is shown in Table E A term of PAN, a net, is 
the parallel composition of agents and messages, with some names possibly re- 
stricted. An agent can be a loeated ambient or a, forwarder. Located ambients are 
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Table 1. The syntax of PAN 



a,b, . . G Names 



h,k, . . G Locations 



p,q, . . G Names U Locations 





Nets 




Agents 


A := 0 


(empty) 


Agent := h t> k 


(forwarder) 


1 Agent 


(agent) 


1 h:n[P]k 


(located ambient) 


1 h{MsgBody} (message) 

1 Ai II A2 (composition) 


Message body 
(request) 


1 {fp)A 


(restriction) ^ 

^ ^ CompieUo 


n (completion) 


Request ~ 


in n, h 


(the agent at h wants to enter 


n) 


1 


in n, h 


(the agent at h, named n, accepts someone in) 


1 


out n, h 


(the agent at h wants to go out of n) 


1 


open n, h 


(the agent at h, named n, accepts to be opened) 


Completion := 


go h 


(change the parent to be h) 




1 


OKin 


(reqnest in accepted) 




1 


migrate 


(reqnest open accepted) 




1 


register P 


(add P to the local processes) 





Process-related syntax: 



0 


1 ix)P 


M ~ X 


out M 


Pi 1 P2 


1 ^ 


1 ^ 


out M 


{un )P 


1 recX.P 


1 inM 


openM 


M.P 


1 wait.P 


1 inM 


openM 


M[P] 


{Request} 






(M) 









the basic unit of PAN, and represent ambients of SA with their local processes. 
The syntax of the processes inside located ambients is similar to that of processes 
in SA. The only additions are: the prefix wait. P, which appears in an ambient 
when this has sent a request to its parent but has not received an answer yet; 
and the requests, which represent messages received from the children and not 
yet served. We use A to range over nets. 

Semantics. The reduction relation of PAN, i — >■, from nets to nets, is defined 
by the rules below. The rules for local reductions, and the associated inference 
rule PAR-PROC, have a special format. We write P — — > Q Msg to mean a 
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process P, local to an ambient n that is located at h, and whose parent is located 
at k, becomes Q and, as a side effect, the messages in Msg are generated. We use 
Msg to indicate a possibly empty parallel composition of messages. For instance, 
if P — - — > Q ^ Msg, then, using proc-AGENT and par-agent, we have, for 

h:n 

any net A: 

A II h:n[P]k < — A || h:n[Q]k || Msg 

When n or h or k are unimportant, we replace them with — , as in P — 

Q ^ Msg. The rule STRUCT-CONG make use of the structural congruence relation 
=, whose definition is similar to that for SA, and includes the standard rules for 
changing the orders of parallel compositions and restrictions, and for unfolding 
recursions. 

The side condition of rule par-proc ensures that all subambients of an am- 
bients are activated as soon as possible, before any local reduction takes place 
(here we exploit the fact that recursions are guarded, otherwise there could be 
an infinite number of ambients to create). 



Local reductions 

(M) I (®). P > P{M/x} > 0 

{in n, h} \ (in n, k} = — > 0 ^ ^{go k} \ fcjOKin} 

{out n, h} I out n. P — >■ P ^ h{go k} 



openn. P | {open n, h} ¥ wait. P ^ /i{migrate} 



Creation 

h:n[m[P] \ Q]v ' — > h:n[Q]h> || i>k {k: m[P]h) 
h:n[i'mP]k < — ¥ lym {h: n[P]k) 



Forwarder 



h t> k \\ h{MsgBody} i — ¥ h \> k \\ k{MsgBody} 
Consumption of request messages 

h:n[P]h' II h{Request} i — ¥ h:n[P \ {Request}]^' 
Emission of request messages (should be h ^ root) 

inm. P — -A — ¥ wait. P ^ k{in m, h} 

inn. P — ¥ wait. P ^ fc{in n, h} 

h:n 

out m. P — -A — wait. P ^ fe{out m, h} 



[local-com] 

[local-in] 

[local-out] 

[local-open] 

[new-locamb] 

[new-res] 

[fw-msg] 
[consume- req] 

[req-in] 

[req-coin] 

[req-out] 
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openn. P — — > wait. P ^ fcjopen n, h} 



[req-coopen] 



Consumption of completion messages 

h:n[P I wait. Q ]fe || h{go h'} i — >■ h-.n[P\ Q]h' [compl-parent] 

h: n[P I wait. Q]k || /i{0Kin} i — > h: n[P \ Q]t [compl-COIn] 

h:n[P I wait.Q]*; || /i{migrate} i — > h \> k \\ fc{register P | Q} [cOMPL-MIGr] 

h:n[P\ wait. Q]u || hjregister 7?} i — >■ h-.n\P \ Q \ R]k [compl-reg] 

Inference rules 

P — > P' ^ Msg Q does not have unguarded ambients 

— ^ .=== [PAR-PROC] 

P I Q — ^ P’\Q-> Msg 



P' ^ Msg 

h:n[P]k I — > h:n[P']k || Msg 

A' , , 

; — res-agent 

vp A I — s> vp A' ^ ‘ 



[prog-agent] 



A 



A = A' 



A' I 



A" 



A, 



A" 



Ai— ^ A' 
Bi — ^ A' II 

A" = A'" 



~ [par-agent] 
[struct-gong] 



5 Correctness of the Abstract Machine 

For lack of space we only report the main correctness result. We refer to the 
full version of the paper, or to the version on the authors’s Web page, for more 
details. 

Let |. ] be the translation of terms of SA into terms of PAN, so defined: 

|P] root: rootname[P]rootparent 

We write A JJ.„ if A is observable at n; this means, intuitively, that A contains 
an agent n that accepts interactions with the external environment. Formally: 
A if A = i/p{h:n[fi.Qi \ Q 2 ]root II A') where p G {inn, openn} and n ^ p. 
Then, using 1=> for the reflexive and transitive closure of i — >, we write A if 
A l=A> Observability in SA is defined similarly: P if P => P', for some 
P' such that P' = vn {n[pL.Qi \ Q 2 ] \ Q 3 ) where p € (inn, openn} and n ^n. 

Theorem 1. Let P G SA. It holds that P {[„ iff |P] for all n. 

The key steps in the proof of TheoremOlare the following. First, since PAN 
separates between the logical and physical distribution of ambients, we need to 
make sure that the two are consistent. For instance, the graph of the dependen- 
cies among locations in the physical distribution, which represents the logical 
structure, should be a tree. We also need conditions that ensure that the wait 
state is used as described informally in previous sections. We therefore introduce 
the notion of well-formed nets and then prove that well-formedness is invariant 
under reductions. Secondly, we prove that on well- formed nets administrative 
reductions do not affect behavioural equivalences, where a reduction A 1 — > A' is 
administrative if its derivation proof does not use the axioms of local reductions. 
Thirdly, we establish an operational correspondence between the reductions of 
a well-typed SA process and of its PAN translation. 
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6 Immobile Ambients 



The other important type of ambients in SA are the immobile ambients. (A 
typed SA program may therefore contain both single-threaded and immobile 
ambients.) These are ambients that: (i) cannot jump into or out of other ambi- 
ents; (ii) cannot be opened. Thus the only capabilities that an immobile ambient 
can execise are in n, out n, and open n; several of them can be ready for execu- 
tion at the same time. 

The same rules for the abstract machine in Section E| could be adopted for 
immobile ambients. This has however the following problem. Consider the pro- 
cess 

P = n[rec X. (inn | um (openm. X \ m[openm])) ] 

(Using replication, the behaviour of P can be expressed as n[!inn].) With the 
rules of Section ^ ambient n could flood its parent with in requests. To avoid 
the problem, we modify par-PROC: 

n is an immobile ambient 
Q does not have unguarder ambients 
P — >■ P' ^ Msg Q or P' do not contain any wait 

— ; flMM-PAR-PROCl 

P I g P' I g » Msg 



We then have to modify also local-OPEN and par-proc, so that an immo- 
bile ambient does not go into a wait state while opening a child ambient: 



n is an immobile ambient 
open m. P I {open m, h} = — > P /i{migrate} 

n is an immobile ambient 
h'. n[P]k II /ijregister P} i — >■ h:n[P \ R]k 



[Imm-local-open] 



[Imm-compl-reg] 



The original rules local-OPEN, par-proc, and COMPL-reg are now used 
only for ST ambients, therefore the corresponding side conditions is added. 

With the new rules, the following property holds (for both ST and immobile 
ambients): an agent can send only one request message at a time to its parent. 
An immobile ambient can exercise several capabilities at the same time. Sending 
one request at a time to the parent is correct because the only capability that 
may produce a request from an immobile ambient named n to its parent is in n 
(the protocol for in can however be executed in parallel with several protocols 
for out and open operations). With the new rules, the addition of immobile 
ambients requires few modifications to the correctness proof of Section 0 



7 Comparisons and Remarks 

Cardelli has produced the first implementation, called Ambit, of an ambient- 
like language; it is a single-machine implementation of the untyped Ambient 
calculus, and is written in Java. The algorithms are based on locks: all the 
ambients involved in a movement (three ambients for an in or out movement. 
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two for an open ) have to be locked for the movement to take place. More recently, 
Fournet, Levy and Schmitt 0 have presented a distributed implementation of 
the untyped Ambient calculus, as a translation of the calculus into Jocaml E3 (a 
programming language based on the distributed Join Calculus |H|). Our abstract 
machine is quite different from the above mentioned implementations mainly 
because: 

(i) We are implementing a variant of the Ambient calculus (the Safe Ambients) 
that has coactions and types for single-threadness and immobility. 

(ii) We separate the logical and physical distribution of an ambient system. 

The combination of (i) and (ii) allows us considerable simplifications, both in the 
abstract machine and in its correctness proof. We are not aware of correctness 
proofs for Ambit. The correctness proof for the Join implementation is very 
ingenious and makes use of sophisticated techniques, such as coupled simulation 
and decreasing diagram techniques. Below, we focus on the differences with the 
Join implementation, which is a distributed implementation, and which we will 
refer to as AtJ (Ambients to Join). 

— In AtJ open is by far the most complex operation, because the underlying 
Jocaml language does not have primitives with a similar effect. In AtJ, every 
ambient has a manager that collects the requests of operations from the sub- 
ambients and from the local processes. If the ambient is opened, its manager 
becomes a forwarder of messages towards the parent ambient. The processes 
local to the opened ambient are not moved. 

As a consequence, in AtJ the processes local to an ambient can be distributed 
on several locations. Therefore, also the implementation of the communica- 
tion rule R-MSG may require exchange of messages among sites, which does 
not occur in PAN, where forwarders are always empty. 

— In AtJ, forwarders are also introduced with in and out operations, to cope 
with possible asynchronous messages still travelling after the move is finished. 
These forwarders are not needed in PAN. 

— In PAN, the presence of coactions dispenses us from having backward point- 
ers from an ambient to its children. In the example of the first row of Figure 
1, without in, ambient c would not know the location of b and therefore 
could not communicate this location to a. Backward pointers, as in AtJ, 
make bookkeeping and correctness proof more complex. 

In PAN, the absence of backward pointers and the presence of coactions 
make the implementation of forms of dynamic linking straightforward: new 
machines hosting ambients can be connected to existing machine running an 
ambient system; it suffices that the new machines know the location of one 
of the running ambients; no modifications or notifications is needed to the 
running ambients themselves. 

— In PAN, since any moving ambient (an ambient that tries to enter or exit 
another ambient, or that can be opened) is single-threaded, each ambient 
requests at most one operation at a time to its parent. By contrast, in AtJ 
an ambient can send an unbounded number of requests to the parent (an 
example is n[!inmi | !outm 2 ]). 
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Moreover, due to this property, in PAN no ambient needs a log of pending 
requests received from a given children or sent to the parent. Without the 
property, both forms of log are needed, as it happens in AtJ. To see why, 
consider two ambients a and b, where b is the parent of a. If moving ambients 
can request several operations concurrently, b must of course keep a log of the 
pending requests from a. A copy of the same log must however be kept by a, 
because messages exchanged among ambients are asynchronous and therefore 
the following situation could arise. Suppose a requests two operations, say 
inn and inm. The request for inn could reach b first. The request for inm 
could reach b only when the movement for in n has been completed (indeed, 
a might have completed other movements). The request inm must now be 
resent to the new parent of a, but b does not possess this information. This 
task must therefore be accomplished by a, which, for this, must have stored 
inm in its log of pending requests to the parent. 

The example also shows that, aside from message retransmission in for- 
warders, some requests may have to be retransmitted several times, to dif- 
ferent parents (in the example, inm); in PAN every request is sent at most 
once. 

— In PAN, any movement for a given ambient is requested to the parent, which 
(assuming this is not a forwarder) makes decisions and gives authorisations; 
the grandparent is never contacted. This homogeneity property breaks in 
presence of backward pointers from an ambient to its children. For instance, 
the simulation of the out reduction in the second row of Figure 1 would 
then need also the involvement of the grandparent c. 

— In AtJ, the domain of physical distribution is a tree. The in and out op- 
erations produce physical movements in which an ambient, and all its tree 
of subambients, must move. To achieve this, the tree of ambients is first 
“frozen” so that all the activities in the ambients of the tree stop while the 
movement takes place. In PAN, where the domain of physical distribution 
is flat, in and out only give logical movement; no freezing of ambients is 
required. On the other hand, in PAN, but not in AtJ, open gives physical 
movement. 

— PAN is an abstract machine, and is therefore independent of a specific target 
language. 



8 Implementation Architecture 

Our implementation, written in Java, follows the definition of the abstract ma- 
chine (as usual in process calculi, rules for arbitrary changing the order of parallel 
components need some randomisation mechanism to be implemented; we do not 
do this, which may reduce non-determinism). Perhaps the main difference is 
that the implementation allows clustering of agents on the same IP node (i.e. a 
physical machine) . Therefore the implementation is made of three layers: agents, 
nodes and the network. The address k of an agent is composed of the IP-name 
of the node on which its resides, plus a suffix, which is different for each agent in 
that node. Each agent is executed by an independent Java thread; the processes 




A Distributed Abstract Machine for Safe Ambients 



419 



local to an ambient are scheduled using a round-robin policy. Each agent knows 
its name, its address, its parent’s address, and keeps a link to its node. 

From a physical point of view, the messages exchanged between agents are of 
two kinds: local, when both agents reside on the same node, and remote, when 
two distinct nodes are involved. In each node a special Java RMI object, with 
its own thread of execution, takes care of inter-node communications. For this, 
nodes act alternatively as clients (requiring that a message is sent to another 
computer) and as servers (receiving a message and pushing it into a local mail- 
box). The node layer is implemented using Java RMI and serialization, and the 
network layer simply provides IP-name registry for RMI communications to take 
place (using Java RMIregistry) . 

An agent acts as an interpreter for the ambient expressions that constitute its 
local processes. When the agent wants to create a subambient, it sends a special 
message to its node, which will spawn a new agent hosting the subambient code. 
We also allow remote creation of new agents: an agent may send a message to 
a node different from its own, to demand the creation of a subambients. This 
corresponds to the addition of a primitive create n[P] at h, where h is the 
IP-name of a node, to the abstract machine. When the execution of an ambient 
expression begins on a given node, the first action is the local creation of a root 
agent. An agent resides on the same node until it is opened; then, its processes 
are serialised and sent via RMI to the parent agent. The implementation also 
allows dynamic linking of ambients, as hinted at in Section 0 



9 Further Developments 

In the abstract machine presented, a message may have to go through a chain of 
forwarders before getting to destination. A (partial) solution to this problem is 
a modification of the rules that guarantees the following property: every agent 
sends a message to a given forwarder at most once. The modification consists in 
adding the source field to the completion messages /i{0Kin}, which thus becomes 
/i{0Kin, k}, where k is the ambient that is authorising the move. Thus the rules 
LOCAL-IN and COMPL-COIN become 

{in n, h] \ {in n, k} = — > 0 ^ h{go k} || fc{0Kin, h'} [local-in2] 

h'.n[P I wait.QJfc || /i{0Kin, /i'} i — > h:n[P \ Q]h> [compl-COIn2] 

The reason why these rules may be useful is that the parent of an ambient 
that has sent a in request may have become a forwarder; thus the real parent 
is another ambient further up in the hierarchy. With the new rules, the parent 
of the ambient that has sent the in request is updated and hence this ambient 
will not go through the forwarder afterwards. With the other capabilities that 
may originate a request from an ambient to is parent (open , out , in ), the issue 
does not arise, because either the requesting ambient is dissolved (open), or its 
parent is anyway modified (out , in). 

Even with the rules above, however, the forwarder introduced in an open 
operation is permanent. We plan to study the problem of the garbage-collection 
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of forwarders. We also plan to experiment the addition of backwards pointers, 
from an ambient to its children; this should avoid the introduction of forwarders 
in an open , but may complicate other parts of the abstract machine. 

In the abstract machine, open is the only operation that gives movement 
of terms. Although at present we do not see the need of enhancing this, the 
modifications for allowing movement of terms also with in and out would be 
simple. The main price is the introduction of additional forwarders, as we have 
now in the open case. 

Acknowledgements. We have benefitted from comments by Jean-Jacques 
Levy and Alan Schmitt. 
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Abstract. It has been argued that Boolean- valued logics and associated 
discrete notions of behavioural equivalence sit uneasily with semantic 
models featnring quantitative data, like probabilistic transition systems. 
In this paper we present a pseudometric on a class of reactive proba- 
bilistic transition systems yielding a quantitative notion of behavioural 
equivalence. The pseudometric is defined via the terminal coalgebra of a 
functor based on the Hutchinson metric on the space of Borel probability 
measures on a metric space. We also characterize the distance between 
systems in terms of a real-valued modal logic. 



1 Introduction 

The majority of verification methods for concurrent systems only produce qual- 
itative information. Questions like “Does the system satisfy its specification?” 
and “Do the systems behave the same?” are answered “Yes” or “No”. Huth 
and Kwiatkowska IE] and Desharnais, Gupta, Jagadeesan and Panangaden |H] 
have pointed out that such discrete Boolean-valued reasoning sits uneasily with 
continuous semantic models like probabilistic transition systems. For instance, 
the probabilistic modal logic of Larsen and Skou P! adds probability thresh- 
olds to traditional modal logic. In this logic one has a formula like Oq4> which is 
satisfied if the sum of the probabilities of transitions to states satisfying (j) ex- 
ceeds q € [0, 1]. Such a formalism does not support approximate reasoning: any 
inexactness in the calculation of the semantics of (j) may result in an incorrect 
conclusion as to the truth or falsity of Oq4>. This is particularly problematic if 
one wants to reason about infinite state systems in terms of finite approximants. 

In a similar vein, Desharnais et al. 0 and Giacalone, Jou and Smolka mi 
have criticized all-or-nothing notions of operational equivalence for probabilistic 
systems such as Larsen and Skou’s probabilistic bisimulation m- Recall that 
a probabilistic bisimulation is an equivalence relation on the state space of a 
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transition system such that related states have exactly the same probability of 
making a transition into any equivalence class. Thus, for instance, the proba- 
bilistic transition systems 
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are only probabilistic bisimilar if e is 0. However, the two systems behave almost 
the same for very small e different from 0. In the words of 0 behavioural equiv- 
alences like probabilistic bisimilarity are not robust, since they are too sensitive 
to the exact probabilities of the various transitions. 

To address some of the issues raised above, Huth and Kwiatkowska introduce 
a non-standard semantics for formulas of the modal ^-calculus over a probabilis- 
tic transition system. Formulas take truth values in the unit interval [0,1]: in 
particular the modal connective is interpreted by integration. A related though 
quite distinct real- valued modal logic is introduced by Desharnais et al. 0. 
Their logic is used to define a notion of approximate equivalence for proba- 
bilistic transition systems; this is formalized as a pseudometri(0on the class of 
all such systems. The pseudometric is intended to provide for compositional rea- 
soning about the approximate equivalence of concurrent interacting probabilistic 
systems. Processes are at 0 distance just in case they are probabilistic bisimilar. 

Many different kinds of transition system can be viewed as coalgebras; Rutten 
m provides numerous examples. De Vink and Rutten m have shown that both 
discrete and continuous (labelled) probabilistic transition systems can be seen 
as coalgebras. By viewing these systems as coalgebras one can transfer results 
from the theory of coalgebra to the setting of probabilistic systems. This theory 
includes a general definition of bisimilarity which De Vink and Rutten studied 
for probabilistic transition systems with discrete and ultrametric state spaces. 

In this paper we obtain a metric-space domain for reactive probabilistic pro- 
cesses as the terminal coalgebra of an endofunctor F on the category of pseu- 
dometric spaces and nonexpansive maps. The definition of F is based on the 
Hutchinson metric on probability measures m F-coalgebras can be seen as re- 
active probabilistic transition systems with discrete or continuous state spaces. 
Unlike the terminal coalgebras studied by De Vink and Rutten EH and Baier and 
Kwiatkowska [5| the metric on our domain varies continuously with transition 
probabilities. It provides for a notion of approximate equivalence of probabilistic 
processes similar to the pseudometric of Desharnais et al. mentioned above. In 
fact, we define a pseudometric on the state space of a reactive transition system 
(seen as an A-coalgebra) as the metric kernel of the unique map to the terminal 

^ A pseudometric differs from an ordinary metric in that different elements can have 
distance 0. 
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F-coalgebra. That is, the distance between two states is the distance between 
their images under the unique map to the terminal coalgebra. We show that our 
pseudometric can also be obtained by adding negation to the logic of Desharnais 
et al. Furthermore we compare our pseudometric with the distance functions of 
De Vink and Rutten and of Norman. 



2 The Pseudometric 

In this section we introduce an endofunctor on the category of pseudometric 
spaces and nonexpansive maps based on the Hutchinson metric. We prove that 
the functor has a terminal coalgebra, and we use this to define our pseudometric 
on reactive probabilistic transition systems. 

In [I4| , Hutchinson introduced a metric on the set of Borel probability mea- 
sures on a metric space. Here, we generalize his definition to pseudometric spaces. 
We restrict ourselves to spaces whose points have distance at most 1, since they 
serve our purpose and simplify the definition of the distance function a little. Let 
X be a 1-bounded pseudometric space. We denote the set of Borel probability 
measures on X by Xf (X). 

Definition 1. The Hutchinson metricQ (^x) • [0; 1] 

defined by 



dM(x) (mi,M 2 ) = sup [L fdfii — J fd ^2 I / G [0, 1] nonexpansive 

A function is nonexpansive if it does not increase any distances. For a proof that 
djo{ (X) is a 1-bounded pseudometric we refer the reader to, for example, Edgar’s 
textbook 0 Proposition 2.5.14]. 

In a pseudometric space, compactness is a natural generalization of finiteness. 
In the rest of this paper, we focus on Borel probability measures which are 
completely determined by their values for the compact subsets of the space X. 

Definition 2. A Borel probability measure p, on X is tight if for all e> 0 there 
exists a compact subset of X such that p (X \ K^) < e. 

Under quite mild conditions on the space, for example, completeness and sepa- 
rability, every measure is tight (see, for example, Parthasarathy’s textbook pa 
Theorem II. 3. 2]). We denote the set of tight Borel probability measures on X 
by Ait (X). We are interested in these tight measures because of the following 

Theorem 1. 

1. X is complete if and only if Ait (X) is complete. 

2. X is compact if and only if Ait (X) is compact. 

Proof. See, for example, |0l Theorem 2.5.25]. □ 



2 



The Hutchinson metric is also known as the Kantorovich metric. 
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JUt can be extended to an endofunctor on the category VAieti of 1-bounded 
pseudometric spaces and nonexpansive functions as follows. Let X and Y be 
1-bounded pseudometric spaces. Let f : X ^ Y he a nonexpansive function. 

Definition 3. The function Mt (/) : Xit (^) — t Mt (Y) is defined by 

(/)(m) = 

It is readily verified that the measure Ait (/)(m) is tight, that the function Ait (/) 
is nonexpansive and that the action of Ait on arrows is functorial. 

Next we state and explain a property of Ait which will later allow us to 
exploit the terminal coalgebra theorem of Turi and Rutten pn|. 

Proposition 1. The functor Ait is locally nonexpansive: for all nonexpansive 
functions fi, f 2 ^ A ^ Y, 

djoit (X)^Mt (Y) {Mt {fi),Mt (/2)) < dx^Y (/l, /2)- 



□ 

A continuous probabilistic transition system with label set L consists of a 1- 
bounded pseudometric space X of states together with a Borel subprobability 
measure pLi^x for each label I and state x. The transition function fii is a condi- 
tional probability determining the reaction of the system to an action I selected 
by the environment, assigns to each Borel set B C X the probability that 
the system makes a transition to a state in the set B given that it was in the 
state X before the action 1. We consider Borel subprobability measure^o allow 
for the possibility that the system may refuse 1. We also require that for each 
Borel set B the map x >->■ fi[^x{B) is measurable, i.e. that XxXB.p,i^x{B) is a 
stochastic kernel. This is the so-called reactive model of probabilistic systems. 
For a detailed discussion of the importance of studying these continuous systems, 
rather than concentrating on the discrete ones, we refer the reader to the work 
of Desharnais et al. EE- In the present paper we stick to discrete systems when 
we come to exemplify our work. Also, for ease of exposition, we only consider 
unlabelled transition systems. That is, we assume L is a singleton space and 
write ytx for pLi^x- Our results extend easily to the labelled case. 

A discrete probabilistic transition system is just a special case of a continuous 
one where the metric on the state space is discrete and the transition probability 
is given by a subprobability distribution. We can picture such a system as a 
directed graph with arcs labelled by probabilities: there is no need to mention 
Borel sets. 

Next, we demonstrate that a large class of continuous probabilistic transition 
systems can be viewed as coalgebras. But first we review some basic notions. 

Definition 4. Let C be a category. Let F : C ^ C be a functor. An F-coalgebra 
consists of an object C in C together with an arrow f : C ^ F (C) in C. An 

In a SM^probability measure we have that {X) < 1 rather than {X) = 1. 
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_F-homomorphism from F -coalgebra {C,f) to F -coalgebra {D,g) is an arrow 
h : C ^ D in C such that F (h) o f = g o h. 

C >D 

F{C) >F{D) 

^ ' F(h) ^ ' 

The F -coalgebras and F -homomorphisms form a category. The terminal object 
in this category, if it exists, is called the terminal i^-coalgebra. 

We consider the functor 



F = I- + ■■ VMeti VMeti, 

where 1 is the terminal objedO functor, + is the coproducl|3 functor, Mt is 
the Hutchinson functor introduced above, and ^ • is the scalin^^ functor. An 
F-coalgebra consists of a 1-bounded pseudometric space X together with a non- 
expansive function /r : A — | - Ait (1 + ^)- A continuous probabilistic transition 
system such that 

• for all states x, the Borel probability measure p,x is tight, and 

• for all states Xi, X 2 , | • dyn* (i+JC) {h‘xi,Tx^) < dx {xi,X 2 ), 

can be viewed as an i^-coalgebra. For now we observe that this class certainly 
includes all discrete probabilistic transition systems, and we refer the reader 
forward to the conclusion for further discussion of these two restrictions. 

Theorem 2. There exists a terminal F -coalgebra. 

Proof. Since the functors 1, +, and Mt are locally nonexpansive (Proposition [Ql 
and the scaling functor | • is locally contractive, the functor F is locally con- 
tractive. According to Theorem Ql the functor Ait, and hence the functor F, 
preserves the subcategory CAieti of 1-bounded complete metric spaces and non- 
expansive functions. According to na Theorem 7.3], this functor restricted to 
CAieti has a terminal coalgebra {fix (F),i,). It is not too hard to see from the 
proof of that theorem that {fix (F), l) is also a terminal F-coalgebra. □ 

The terminal object of VMeti is the singleton space. 

® The coproduct object of the objects X and Y in VMeti is the disjoint union of the 
sets underlying the spaces X and Y endowed with the pseudometric 

{ dx (v, w) ii V & X and w G X 
dy {v, w) ii V G Y and w G Y 
1 otherwise. 



The scaling by | • of an object in VMeti leaves the set unchanged and multiplies 
all distances by a half. 
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The distance in the terminal coalgebra is a trade-off between the depth of obser- 
vations needed to distinguish systems, and the amount each observation differ- 
entiates the systems. The relative weight given to these two factors is determined 
by the contraction introduced in the definition of the functor F . 

Now we present the definition of our pseudometric on probabilistic transition 
systems. Instead of directly defining a pseudometric on systems, we define it on 
the states of a system. The distance between two systems can be obtained by 
combining the two systems into one and taking the distance between the initial 
states of the original systems in the combined one. For a continuous probabilistic 
system represented by the F-coalgebra {X, /r), let us write |— ] for the unique 
map (X,^) -)> {fix{F),L). 

Definition 5. The distance function dn : x [0, 1] is defined by 

dn (^ 1 ; ^2) djix (F) (l^il : 1^2! (x,p) ) 

Note that we now have two pseudometrics on the state space X: the original 
pseudometric dx which defines the Borel sets and the above introduced pseu- 
dometric dn which captures the difference in behaviour in a quantitative way. 
Since the function |— is nonexpansive, the djf-distances are greater than 
or equal to the dn-distances. 

3 Desharnais, Gupta, Jagadeesan, and Panangaden 

We compare our pseudometric with the one introduced by Desharnais, Gupta, 
Jagadeesan and Panangaden in 0. We argue that our distances are more in- 
tuitive than theirs. Furthermore, we extend one of their definitions a little and 
show that the pseudometric so obtained coincides with ours. 

Consider the following three probabilistic transition systems. 




The first system terminates with probability 0, the second one with probabil- 
ity ^ and the third one with probability 1. The probability that the systems 
make, for example, at most three transitions is 0, ^ and respectively. Based 
on these kind of observations, one may infer that the first system behaves more 
like the second one than the third one. This is reflected by our pseudometric, 
since the first and second system are ^ apart whereas the first and third system 
are at distance However, in the pseudometric introduced by Desharnais et 
al. both the first and the second system and the first and third system are ^ 
apart. 

Desharnais et al. defined their pseudometric in terms of a real- valued logic. 
Their work builds on an idea of Kozen uni to generalize logic to handle proba- 
bilistic phenomena. An extension of their real-valued modal logic is introduced 
in the following definition. 
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Definition 6. The set T of functional expressions is defined by 
/ ::= 1 I o / I max {fj)\l-f\f-q 
where q is a rational in [0, 1]. 

Informally, there is the following correspondence between functional expressions 
and formulae in the probabilistic modal logic of Larsen and Skou (see also jnj 
El). True is represented by 1, disjunction is represented by max, negation by 1 — , 
and the connective Oq decomposes as o and — q. The main difference between the 
above definition of functional expressions and the one presented by Desharnais 
et al. is the presence of negation Jj 

Given a continuous probabilistic transition system represented by the F- 
coalgebra (X, /r), each functional expression / can be interpreted as a function 
f(x,it) from X to [0, 1] as follows. 

Definition 7. For each f € F, the function f{x,n) : ^ [0, 1] is defined by 

(a:) = 1 

(o/)(jc,m) {x)=\- fi^x,^,) dfix 

(max(/,g))^x.M> (a^) = max(/^x,M) {x),g^x,f,) (a;)) 

(1 - (a:) = 1 - (x) 

(/ - q)(x,t,) (a;) = /(x.m) (a^) - Q 

where 

_ (r — q if r>q 
^ ^ 1 0 otherwise. 

It is readily verified that for all f G F the function f{x,n) is nonexpansive, and 
hence measurable. The functional expressions induce a pseudometric as follows. 

Definition 8. The distance function c?dgjp : ^ x X — >■ [0, 1] is defined by 

doGJP (a;i,a; 2 ) = sup (a;i) - f{x,tj.) (x 2 )- 

/ea=- 

Clearly, the above introduced distance function is a 1-bounded pseudometric. 
Now we have three different distance functions on the state space X: dx, dn 
and doGJP- To distinguish these three pseudometric spaces we denote them by 
(X,dx), {X,dB_) and (X, dDGJp)- Since the functions f{x,fi) are nonexpansive, 
the dx-distances are greater than or equal to the dDGJP-distances. 

In the rest of this section, we give an outline of a proof that dn and doGJP 
coincide. In fact we concentrate on proving the inequality dn < doGJP) the 
converse being more straightforward. To this end we introduce a transition func- 
tion fi' such that {{X , duGJp) , t') is an T-coalgebra. Since the dx-distances are 
greater than or equal to the dDGJP-distances, every Borel set on 1 -b {X, doGJp) 
is a Borel set on 1 -b {X, dx). Therefore, we can define for every x G X the Borel 
probability measure as restricted to the Borel sets on 1 -b (X, dnGJp)- Of 
course, we have to check that yi' is an arrow in VXieti. 



^ In a draft version, but not in the final version, of 0 negation was considered. 
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Proposition 2. The function is nonexpansive. 



Proof. Let e>0. Let x € X. Since the measure is tight, there exists a compact 
subset K of {X, dx) such that {X\K)< e. Since the dx-distances are greater 
than or equal to the dnajp-distances, K is also a compact subset of {X, doGJ?)- 
From the definition of p' we can conclude that p'^ {X \ K) < e. 

Let gr : X — ^ [0, 1] be a function which is nonexpansive with respect to doGJP- 
Then there exists a functional expression / such that g f K and /(j^) f K are 
at most e apart. This can be proved by exploiting P Lemma A. 7. 2)3. Using all 
the above, we can show that 

/ gdp'^ and / f{x,p) dpx are at most 3e apart. (1) 

Jx Jx 



Let xi, X 2 € X. Without loss of generality, we may assume that p'^^ (1) < 
(1). Hence, 

{X,dDGJp) 



= i • sup I y g dp'^^ ~ J d dh'x 2 \ 9 & (X, dnGJp) [0, 1] is nonexpansive | 

< i • sup / /(x.m) dpx, - / /(x.m) dpx 2 [O] 

/G.F Jx Jx 



= sup (xi) - (o/)(x.m) {x 2 ) 



< sup (a;i) - f{x,tJ.) ( 2 ^ 2 ) 

= doGJP (a^i, 2:2), 



that is, p' is nonexpansive. □ 

Note that o, min and max (which is a combination of min and 1 — ) play a role 
in the above proof. Also — q and 1 are needed in some of the details of the proof 
which are not presented here. 

One can easily verify that the nonexpansive function i from (A, dx) to 
(A, dDGJp) mapping x to x is an F-homomorphism. 



(A, dx) - 
F{X,dx) 



S- fix (F) ^ (A, dDGJp) 

TTTI — ^ ^ ^ 



F{i) 



® Let A be a compact Hausdorff space. Let A be a set of the real-valued continuous 
functions on K such that f € A and g £ A implies max (/, g) £ A and min (/, g) £ A. 
If a function / can be approximated at each pair of points by functions in A then / 
is in the closure of A. 
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Hence, |— and |— are both F-homomorphisms from {X,dx) to 
fix (F). Since fix (F) is terminal they are equal, i.e. for all x G X, 

(2) 



Theorem 3. For all x\, X2 G X, dn (xi,X2) < doGJ? (xi,X2)- 
Proof. 

du (xi,X2) 

= dfix(F) Ia^2](X,Ai)) 

= dfix {F) ( [a;il {X,^i') , [a^2l {X,,!') ) [0] 

< dDGJP (xi,X2) [|I-]{x,m'> is nonexpansive] 



Thus we have shown that our pseudometric can also be characterized by a real- 
valued modal logic similar to the one studied by Desharnais et al. 

4 De Vink and Rutten 

We make another comparison, this time with the distance function introduced 
by De Vink and Rutten in m Remarks similar to the ones made below about 
their distance function apply also to the distance functions presented by Baier 
and Kwiatkowska | 3 | and Den Hartog H 2 |. 

Consider the following two probabilistic transition systems. 




xi X2 yi 2/2 



4 4 

X3 2/3 

Clearly, the smaller e is, the more alike these systems behave. Our pseudometric 
captures this since dp (xo,yo) = f- However, in De Vink and Rutten’s setting 
these systems are ^ apart if e yf 0 . More generally, the distance between two 
systems in their setting is 2 “”“^ where n is the depth of probabilistic bisimilarity 
between them. 

De Vink and Rutten consider the functor 

G = l+Mc(^ CUMeti ^CUMeti, 

where Aic denotes the Borel probability measures with compact support. The 
main differences between our functor F and their functor G are the following. 
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They consider a distance function on Borel probability measures f2^ I Defini- 
tion 5.3] different from the one of Hutchinson (Definition ^) . Their distance 
function only captures qualitative information as the above example illus- 
trates. 

They consider the category CUMeti of 1-bounded complete ultrametric 
spaces and nonexpansive functions whereas we consider the considerably 
larger category VMeti. This allows us to captures many more interesting 
continuous probabilistic transition systems as coalgebras, including systems 
where the state space is the real interval [0, 1] endowed with the Euclidean 
metric. 

They consider Borel probability measures with compact support whereas we 
consider the more general tight Borel probability measures. Again this allows 
us to represent more systems as coalgebras. 

Their model only allows processes to refuse transitions with probability 0 or 

1 . 



We have generalized all the results for the functor G in |22 Section sjl to 
setting. 



our 



5 Norman 

We compare our pseudometric with the pseudometric introduced by Norman in 
m Section 6.1]. Consider the following two probabilistic transition systems. 



X2 



Xo 




1 yo 


1 


4 




y ^ 


2 


Xi 


1 


Vi 


2/2 




2 


4 








2/3 


2/4 












X4 




ys 



These systems are not probabilistic bisimilar. In Norman’s pseudometric the 
systems have distance 0. In our pseudometric, systems only have distance 0 if 
they are probabilistic bisimilar. In our setting the systems are ^ apart. This 
example also shows that his pseudometric gives rise to a topology different from 
ours. 

The main differences between his and our pseudometric are the following. 

• He uses a linear-time model whereas we consider a branching-time model. 

• He considers only discrete probabilistic transition systems whereas we also 
consider continuous ones. 

• We use the usual categorical machinery and various standard constructions 
whereas his definitions are more ad-hoc. We believe however that his pseu- 
dometric can also be characterized by means of a terminal coalgebra. 



® The proof of I Theorem 5.8] is incomplete. We also have no proof for this result 
in our setting. 
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Conclusion 

In this paper, we have presented a new pseudometric on a class of probabilistic 
transitions systems. The pseudometric was defined via the terminal coalgebra 
of a functor based on the Hutchinson metric on the space of Borel probability 
measures on a pseudometric space. We also characterized the distance between 
systems in terms of a real-valued modal logic. Similar results have been presented 
by the second author in his thesis in the setting of bimodules and generalized 
metric spaces. 

Let us isolate two distinct consequences of our use of the Hutchinson met- 
ric. We can talk about approximate equivalence of processes and we can model 
continuous-state systems as coalgebras. An apparent restriction with regard to 
the latter point is the requirement that the structure map of an A-coalgebra 
be nonexpansive. Properly speaking, continuous probabilistic transition systems 
as formulated in Section 13 are coalgebras of (a variant of) the Giry monad on 
the category of measurable spaces m- However, we conjecture that the termi- 
nal T’-coalgebra {fix (F), t)is also terminal when seen as a coalgebra of the Giry 
functor, and that our results can be extended to continuous-state systems in 
general. 

Exploiting Theorem ^ and some results by Alessi et al. j2] we have shown 
that our terminal coalgebra is compact and hence separable. Furthermore we 
have shown that the unique map from the initial algebra of a finitary version 
of F — representing finite discrete probabilistic transition systems with rational 
probabilities — to the terminal E-coalgebra is a dense embedding. Hence, every 
continuous system can be approximated by a finite one (see also |E|). 

Making use of linear programming, we have developed an algorithm that 
calculates our distance between finite state systems to a prescribed degree of 
accuracy in polynomial time, cf. 0. 

Many system combinators can be shown to be nonexpansive with respect to 
our pseudometric. This quantitative analogue of congruence allows for composi- 
tional verification (see also [/II llj b 
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Abstract. A “based” plane triangulation is a plane triangulation with 
one designated edge on the outer face. In this paper we give a simple 
algorithm to generate all biconnected based plane triangulations with 
at most n vertices. The algorithm uses 0{n) space and generates such 
triangulations in 0(1) time per triangulation without duplications. The 
algorithm does not output entire triangulations but the difference from 
the previous triangulation. By modifying the algorithm we can gener- 
ate all biconnected based plane triangulation having exactly n vertices 
including exactly r vertices on the outer face in 0(1) time per triangu- 
lation without duplications, while the previous best algorithm generates 
such triangulations in O(n^) time per triangulation. Also we can gener- 
ate without duplications all biconnected (non-based) plane triangulations 
having exactly n vertices including exactly r vertices on the outer face 
in 0{r^n) time per triangulation, and all maximal planar graphs having 
exactly n vertices in 0{n^) time per graph. 



1 Introduction 

Generating all graphs with some property without duplications has many appli- 
cations, including unbiased statistical analysis |lV198| . A lot of algorithms to solve 
these problems are already known | fA96Uj8()lV198IW?^ etc]. See nice textbooks 
IIGOHIKSOHI . 

In this paper we wish to generate all biconnected “based” plane triangu- 
lations, which will be defined precisely in Section 2, with at most n vertices. 
Such triangulations play an important role in many algorithms, including graph 
drawing algorithms jGlNH8|KFFh()pShOI etc]. 

To solve these all-graph-generating problems some types of algorithms are 
known. 

Classical method algorithms p57] first generate all the graphs with 

given property allowing duplications, but output only if the graph has not been 
output yet. Thus this method requires quite a huge space to store a list of graphs 
that have already been output. Furthermore, checking whether each graph has 
already been output requires a lot of time. 

Orderly method algorithms una p57] need not to store the list, since they 
output a graph only if it is a “canonical” representative of each isomorphism 
class. 



F. Orejas, P.G. Spirakis, and J. van Leeuwen (Eds.): ICALP 2001, LNCS 2076, pp. 433-^^^ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



434 



Z. Li and S.-i. Nakano 



Reverse search method algorithms also need not to store the list. The 
idea is to implicitly define a connected graph H such that the vertices of H 
correspond to the graphs with the given property, and the edges of H correspond 
to some relation between the graphs. By traversing an implicitly defined spanning 
tree of H, one can find all the vertices of H, which correspond to all the graphs 
with the given property. 

The main idea of our algorithm is that for some problems we can define a 
tree (not a general graph) as the graph H of reverse search method. Thus our 
algorithm does not need to find a spanning tree of H, since H itself is a tree. 
With some other ideas we give the following four simple but efficient algorithms. 

Our first algorithm generates all biconnected based plane triangulations with 
at most n vertices. A based plane triangulation means a plane triangulation with 
one designated “base” edge on the outer face. For instance there are four bi- 
connected based plane triangulations with at most four vertices, as shown in 
Fig. IHa). The base edges are depicted by thick lines. However, there are only 
three biconnected plane triangulations with at most four vertices. See Fig. mb). 
The algorithm uses 0{n) space and runs in 0{f{n)) time, where f{n) is the 
number of nonisomorphic biconnected based plane triangulations with at most 
n vertices. The algorithm generates triangulations without duplications. So the 
algorithm generates each triangulation in 0(1) time on average. The algorithm 
does not output entire triangulations but the difference from the previous trian- 
gulation. 



A A 



A 

A A 



(a) (b) 

Fig. 1. (a) Biconnected based plane triangulations, and (b) biconnected plane trian- 
gulations. 



By modifying our first algorithm we can generate without duplications all 
biconnected based plane triangulations having exactly n vertices including ex- 
actly r vertices on the outer face. The algorithm uses 0(n) space and runs in 
0{f{n,r)) time, where /(n, r) is the number of nonisomorphic such triangula- 
tions. So the algorithm generates each triangulation in 0(1) time on average, 
while the previous best algorithm generates such triangulations in 0{v?) 

time per triangulation. 
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Also we can generate all biconnected (non-based) plane triangulations having 
exactly n vertices including exactly r vertices on the outer face in O(r^n) time 
(on average) per triangulation. Another algorithm with 0{n^) time per triangu- 
lation is also claimed in without detail but using a complicated theoretical 

linear-time plane graph isomorphism algorithm EMa, while our algorithm is 
simple and does not need the isomorphism algorithm. 

Also we can generate all maximal planar graphs having exactly n vertices in 
0{n^) time (on average) per graph. 

The rest of the paper is organized as follows. Section 2 gives some definitions. 
Section 3 shows a tree structure among biconnected based plane triangulations. 
Section 4 presents our first algorithm. By modifying the algorithm we give three 
more algorithms in Section 5. Finally Section 6 is a conclusion. 

2 Preliminaries 

In this section we give some definitions. 

Let G be a connected graph with n vertices. An edge connecting vertices x 
and y is denoted by {x,y). The degree of a vertex v is the number of neighbors 
of u in G. A cut is a set of vertices whose removal results in a disconnected 
graph or a single- vertex graph Ki. The connectivity n(G) of a graph G is the 
cardinality of the minimum number of vertices consisting a cut. G is k— connected 
if k< k{G). 

A graph is planar if it can be embedded in the plane so that no two edges 
intersect geometrically except at a vertex to which they are both incident. A 
plane graph is a planar graph with a fixed planar embedding. A plane graph 
divides the plane into connected regions called faces. The unbounded face is 
called the outer face, and other faces are called inner faces. We regard the contour 
of a face as the clockwise cycle formed by the vertices and edges on the boundary 
of the face. We denote the contour of the outer face of plane graph G by Go(G). 
A plane graph is called a plane triangulation if each inner face has exactly three 
edges on its contour. A based plane triangulation is a plane triangulation with 
one designated edge on the contour of the outer face. The designated edge is 
called the base edge. 

3 The Removing Sequence and the Genealogical Tree 

Let S„ be the set of all biconnected based plane triangulations with at most n 
vertices. In this section we explain a tree structure among the triangulations in 
Sn- 

Let G be a biconnected based plane triangulation having four or more ver- 
tices. Let Co{G) = wi,W2, ■ ■ ■ , Wk, and (wi,Wk) be the base edge of G. 

A vertex iCg,! < s < k, on Gq(G) is removable if removing Ws from G 
preserves biconnectivity. Since G is a biconnected based plane triangulation, the 
resulting graph after removing a removable vertex v is also a biconnected based 
plane triangulation with the same base edge. 
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An edge {wi,Wj) in G is called a chord of G if i + 2 < j. Intuitively, each 
chord is an edge connecting two non-consecutive vertices on Go(G). However, 
especially, the base edge {wi,Wk) is also a chord. So G always has at least one 
chord. 

We have the following lemma. 

Lemma 1. Every hiconnected based plane triangulation with four or more ver- 
tices has at least one removable vertex. 

Proof. Let G be a biconnected based plane triangulation having four or more 
vertices. Let (wi,Wj) be a chord with the minimum j — i, where i < j. Then 
each Ws,i < s < j, is removable, because no cut consisting of exactly one vertex 
appears after removing w^. □ 



If Ws is removable but W2,ws, • • • , Wg-i are not, then Wg is called the leftmost 
removable vertex of G. We can observe that if Wg is the leftmost removable vertex 
then each of u>2, ws, • • • , is an end of at least one chord. (So they are not 
removable.) 

For each triangulation G in except K^, if we remove the leftmost remov- 
able vertex then the resulting triangulation, denoted by P{G), is also a triangu- 
lation in Sn having one less vertices. Thus we can define the unique triangulation 
P{G) in Sn for each G in S'„ except K^. We say G is a child triangulation of 



Given a triangulation G in S'„, by repeatedly removing the leftmost removable 
vertex, we can have the unique sequence G, P(G), P(P(G)), • • • of triangulations 
in Sn which eventually ends with K^. By merging those sequences we can have 
the genealogical tree Tn of Sn such that the vertices of correspond to the 




Fig. 2. Genealogical tree T 5 . 



P(G). 
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triangulations in S'„, and each edge corresponds to each relation between some 
G and P{G). For instance T 5 is shown in Fig. 13 in which each leftmost removable 
vertex is depicted by a white circle. We call the vertex in corresponding to 
the root of T„. 

4 Algorithm 

Given 5'„ we can construct Tn by the definition, possibly with a huge space and 
much running time. However, how can we construct T„ efficiently only given an 
integer nl Our idea is by reversing the removing procedure as follows. 




Fig. 3. Illustration for G{i,j)- 



Given a biconnected based plane triangulation G in Sn with at most n—1 ver- 
tices, we wish to find all child triangulations of G. Let Co{G) = W\,W 2 , • • • , Wfc, 
and {wi,Wk) be the base edge of G, and Ws be the leftmost removable vertex of 
G. Since K 3 has no removable vertex, for convenience, we regard Wk{= W 3 ) as 
the leftmost removable vertex for K^. We denote by G{i,j), 1 < * < J < fc, the 
based plane triangulation obtained from G by adding new vertex v on the outer 
face of G, and adding j — i + 1 > 2 edges {wi, v), (wi+i,v), • • • , (wj,v), as shown 
in Fig. 0 G{i,j) is a child triangulation of G if and only if v is the leftmost 
removable vertex of G{i,j)- 

Since Wg is the leftmost removable vertex of G, each Wt, 1 < t < s, has at least 
one chord (wt, Wu) such that s < u. (Otherwise, Wg is not the leftmost removable, 
a contradiction.) We denote by q(t) the largest index such that is a 

chord. 

We have the following four cases to consider. 

Case 1 : j < s. 

In this case v is the leftmost removable vertex of G{i,j)- Thus P{G{i,j)) = G. 
Case 2 : i < s < j . 

If j > Wq(i)^ then Wi not v is the leftmost removable vertex of G{i,j), and 
P{G{i,j)) ^ G. Otherwise, v is the leftmost removable vertex of G{i,j), and 
P{G{z,j)) = G. 

Case 3 : i = s. 

If J = i -I- 1 then u is the leftmost removable vertex of G ( i ,_)) , and P{G{i,j)) = 
G. Otherwise, j > i + 2 holds then Wg is a (possibly leftmost) removable vertex 
ofG(z,j), and P{G{i,j))^G. 
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Case 4 : i > s. 

In this case v is not the leftmost removable vertex of G{i,j). Thus 
P{G{z,j))^G. 

Based on the case analysis above we can find all child triangulations of given 
triangulation in Sn- If G has k child triangulations, then we can find them in 
0{k) time. This is an intuitive reason why our algorithm generates triangulations 
in 0(1) time per triangulation. 

And recursively repeating this process from the root of corresponding to 
K 3 we can traverse T„ without constructing whole T„. During the traversing of 
T„, we assign a label (i,j) to each edge connecting G and G{i,j) in T„, as shown 
in Fig. 121 Each label denotes how to add a new vertex to G to generate a child 
triangulation G{i,j), and each sequence of labels on a path starting from the 
root specifies a triangulation in S'„. For instance (1, 2), (1, 2) specify the leftmost 
triangulation in Fig. |21 During our algorithm we will maintain these labels only 
on the path from the root to the “current” vertex, because those are enough 
information to generate the “current” triangulation. To generate next triangula- 
tion, we need to maintain some more information (the leftmost removable vertex 
rcs, and rCg(t) for each 1 < t < s, etc.) only for the triangulations on the “current” 
path, which has length at most n. This is an intuitive reason why our algorithm 
uses only 0 {n) space, while the number of triangulations may not be bounded 
by a polynomial in n. 

Our algorithm is as follows. 

Procedure find-all-child-triangulations(G) 
begin 

1 output G { Output the difference from the previous triangulation} 

2 if G has exactly n vertices then return 

3 for z = 1 to s — 1 

4 for j = z -I- 1 to s 

5 find-all-child-triangulations(G(z, j)) { Case 1} 

6 for z = 1 to s — 1 

7 for j = s -I- 1 to q{i) 

8 find-all-child-triangulations(G(z, j)) { Case 2} 

9 find-all-child-triangulations(G(s, s -|- 1)) { Case 3} 

end 

Algorithm find-all-tr iangulations (Ts ) 
begin 

1 output 

2 G = A 3 

3 find-all-child-triangulations(G(l, 2)) 

4 find-all-child-triangulations(G(2, 3)) 

5 find-all-child-triangulations(G(l, 3)) 
end 
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Theorem 1. The algorithm uses 0{n) space and runs in 0{f{n)) time, where 
f{n) is the number of nonisomorphic biconnected based plane triangulations with 
at most n vertices. 

Proof. We need to maintain for current triangulation (i) a doubly linked list of 
vertices on Co, (ii) the leftmost removable vertex Ws, and (iii) tUq(t) for each 
1 < t < s. When we recursively call the find-all-child-triangulation, we need 
to update the (i)-(iii) above, and when we return from the recursive call we 
need to restore the (i)-(iii) above. We can do these in (1) time, respectively, as 
follows. 

We can update (i) easily. 

When we recursively call, one of Case 1—3 occurs, and then the newly added 
vertex always becomes the leftmost removable vertex of G(i,j). Also by recoding 
this update on a stack we can restore (ii) when return occurs. Thus we can update 
(ii), too. 

Also, when we recursively call, if either Case 1 or 2 occurs, then we already 
have all (iii), since (iii) of G{i,j) is a prefix of (iii) of G, otherwise Case 3 occurs, 
then we only need to set the Wg+i of G as rCg(s) of G{i,j). Again by recoding 
this update on a stack we can restore (iii) when return occurs. 

Thus we can update (i)-(iii) in 0(1) time. 

For other part our algorithm needs only a constant time of computations for 
each edge of the tree. Thus the algorithm runs in 0(/(n)) time. 

For each recursive call we need a constant number of space, and the depth 
of recursive call is bounded by n. Thus the algorithm uses 0(n) space. □ 

5 Modification of the Algorithm 

Then we consider our second problem. 

A vertex w of G is called an inner vertex of G if u is not on Go(G). Let S^^Zi 
be the set of biconnected based plane triangulation having at most n — 1 vertices 
including at most n — r inner vertices. And let be the set of biconnected 

based plane triangulation having exactly n vertices including exactly n — r inner 
vertices. 

We wish to generate all triangulations in without duplications. 

For each triangulation G in if we remove the leftmost removable vertex 

V then the resulting triangulation P{G) is a triangulation in Sf,Zl having one less 
vertices, and if v has exactly two neighbors on Go(P(G)), then G has the same 
number of inner vertices with P{G), otherwise v has three or more neighbors on 
Go(P(G)), and then G has more inner vertices than P{G) has. 

Also, for each triangulation G in S^^Zi except if we remove the leftmost 
removable vertex then the resulting triangulation P{G) is also a triangulation in 
5'”Zi having one less vertices and having less or equal number of inner vertices. 

Thus for each G in U SffZi except we can again define the unique 

triangulation P(G) in Sf,Zi. Thus we again have the genealogical tree such 
that (i) the leaf vertices of correspond to the triangulations in , (ii) 
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the non-leaf vertices of correspond to the triangulations in S'"Zi, and (iii) 
each edge corresponds to each relation between some G and P{G). For instance 
is shown in Fig. 01 in which each leftmost removable vertex is depicted by a 
white circle. The vertex in corresponding to is called the root of ■ 

Given a triangulation G in we wish to find all child triangulations of 

G. Let Go{G) = wi,W 2 , • • • , Wk, and (wi,Wk) be the base edge of G, and Wg be 
the leftmost removable vertex of G. We denote by G{i,j), 1 < i < j < k, the 
based plane triangulation obtained from G by adding new vertex v on the outer 
face of G, and adding j — i + 1 > 2 edges {wi, v), (iCi+i, u), • • • , (wj,v), as shown 
in Fig. El 

We have the following lemma. 




|l3) 





(1,3)1 





Fig. 4. Genealogical tree Tg 



Lemma 2. Let G be a based plane triangulation in (a) If G has at most 

n — 2 vertices then G has at least two child triangulations in Sf^Zi ■ (b) If G has 
exactly n — 1 vertices then G has at least one child triangulation in 

Proof. If G = then the claim holds. Assume otherwise. 

Let Go{G) = wi,W 2 , ■ ■ ■ , Wk, and (wi,Wk) be the base edge of G. Let Wg be 
the leftmost removable vertex of G. 

(a) G(s — 1, s) and G(s, s -I- 1) are child triangulations of G and in 5'”Z[. 

(b) Let t be the number of inner vertex of G. By the definition of 

t < n — r holds. Any child triangulation of G must have exactly n — r inner 
vertices by the definition of ■ Thus we have to add a new vertex to G with 
exactly n — r — t + 2 edges to have n — r — t more inner vertices. Since v is the 
leftmost removable vertex of G(l, n — r — t + 2), G(l, n — r — t + 2) is a child 
triangulations of G and in □ 
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Now we wish to find all child triangulations of G. We have the following two 
cases to consider. 

Let C'o(G) = wi,W 2 , ■ ■ ■ ,Wk, and (wi,Wk) be the base edge of G, and Wg be 
the leftmost removable vertex of G. Let t be the number of inner vertices of G. 
Case 1: G has exactly n — 1 vertices. 

li t = n — r then only G{i, i + 1) such that 1 < i < s is a child triangulation 
of G. 

Otherwise t < n — r holds, and we need to add a new vertex to G with exactly 
n — r — t + 2 edges to have n — r — t > 0 more inner vertices. Now only G{i,j) such 
that (i) i < s, (ii) j < q{i), and (iii) j— *— l = n — r — t, isa child triangulation 
of G. (If i > s or j > q(i) then the new vertex v is not the leftmost removable 
vertex of G. And iij — i— ly^n — r — t then the resulting graph cannot have 
exactly n — r inner vertices.) 

Case 2: G has at most n — 2 vertices. 

li t = n — r then only G(i, i + 1) such that 1 < i < s is a child triangulation 
of G. 

Otherwise t < n — r holds, and we need to preserve the number of inner 
vertices at most n — r after adding a new vertex to G. 

We have the following four subcases similar to the four cases in Section 4. 
Case 2(a): j < s. 

lit+ {j — i — 1) <n — r then P{G{i,j)) = G, otherwise P{G{i,j)) ^ G. 

Case 2(b): i < s < j. 

If j < ?(*) and t + {j — i — 1) < n — r then P{G{i,j)) = G, otherwise 
P{G{i,j))^G. 

Case 2(c): i = s. 

li j = i + 1 (now t+{j — i — 1) < n — r holds) then P{G{i,j)) = G, otherwise 
P{G{i,j))jiG. 

Case 2(d): i > s. In this case P{G{i,j)) ^ G. 

Based on the case analysis above we can have an algorithm to find all trian- 
gulations in 5="“’'. We have the following lemma. 

Lemma 3. The algorithm uses 0(n) space and runs in 0(/(n,r)) time, where 
f(n,r) is the number of nonisomorphic biconnected based plane triangulations 
having exactly n vertices including exactly r vertices on the outer face. 

Proof. By Lemma El the number of vertices in T””’’ is at most 3 • = 

3 • /(n, r). And the algorithm need only a constant time of computation for each 
edge of Thus the algorithm runs in 0(/(n, r)) time. The algorithm clearly 

uses 0(n) space. □ 

We modify our second algorithm so that it output all biconnected (non-based) 
plane triangulations having exactly n vertices including exactly r vertices on the 
outer face, as follows. 

At each leaf v of the genealogical tree , the triangulation G correspond- 
ing to V is checked whether the adding sequence of G with the base edge is the 
lexicographically first one among the r adding sequences of G for r choice of 
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the base edge on Co(G), and only if so G is output. Thus we can output only 
canonical representative of each isomorphism class. 

Lemma 4. The algorithm uses 0{n) space and runs in 0{r^n ■ g{n,r)) time, 
where g{n,r) is the number of nonisomorphic biconnected (non-based) plane tri- 
angulations having exactly n vertices including exactly r vertices on the outer 
face. 

Proof. Given a biconnected based plane triangulation G, by counting (and up- 
dating) the number of chord incident to each vertex on Co{G), we can find the 
adding sequence in 0(n) time. For each triangulation corresponding to a leaf of 
we construct r adding sequences for r choice of the base edge on Go{G), 
and find the lexicographically first one in 0(rn) time, and for each output tri- 
angulation our tree may contain r isomorphic ones corresponding to r choices of 
the base edge. Thus the algorithm runs in 0(r'^n ■ g(n,r)) time. The algorithm 
clearly uses 0{n) space. □ 

A planar graph with n vertices is maximal if it has exactly 3n — 6 edges. 
Every maximal planar graph except is triconnected, and every triconnected 
planar graph has a unique embedding on a sphere only up to mirror copv [H W74j . 
And in the embeddig every face has exactly three edges on its contour. Thus, 
for every maximal planar graph by choosing the outer face and the base edge, 
there are exactly 2m (biconnected) based plane triangulation G with exactly 3 
vertices on the outer face, where m is the number of edges of G. We modify the 
algorithm further as follows. At each leaf v of the genealogical tree the 

triangulation G corresponding to v is checked whether the adding sequence of 
G with the base edge is the lexicographically first one among the 2m adding 
sequences of G (3 choice of the base edge on Go{G) for each of 2m/3 choice of 
the outer face of G), and only if so G is output. Thus we have the following 
theorem, which is an answer for an open problem in 

Theorem 2. The modified algorithm generate all maximal planar graphs in 
0{n^ ■ h{n)) time, where h{n) is the number of nonisomorphic maximal planar 
graphs with exactly n vertices. The algorithm uses 0(n) space. 



6 Conclusion 

In this paper we have given four simple algorithms to generate all graphs with 
some property. Our algorithms first define a genealogical tree such that each 
vertex corresponds to each graph of the given property, then output each graph 
without duplications by traversing the tree. 

To find other all-something-generating problems to which our method can 
be applied is remained as an open problem. 
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